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ABSTRACT 

The ever increasing size and complexity of data coming from simulations of cosmic struc- 
ture formation demands equally sophisticated tools for their analysis. During the past decade, 
the art of object finding in these simulations has hence developed into an important discipline 
itself. A multitude of codes based upon a huge variety of methods and techniques have been 
spawned yet the question remained as to whether or not they will provide the same (physical) 
information about the structures of interest. This concern gave rise to a series of workshops 
and comparison papers of which we here present a brief summary. But we also go one step 
further and investigate in more detail the (possible) origin of any deviations across finders. 
To this extent we decipher and discuss differences in halo finding methods, clearly separating 
them from the disparity in definitions of halo properties. We observe that different codes not 
only find different numbers of objects leading to a scatter of up to 20 per cent in the halo mass 
and T4nax function, but also that the particulars of those objects that are identified by all finders 
differ. The strength of the variation, however, depends on the property studied, e.g. the scatter 
in position, bulk velocity, mass, and the peak value of the rotation curve is practically below 
a few per cent, whereas derived quantities such as spin and shape show larger deviations. Our 
study indicates that the prime contribution to differences in halo properties across codes stems 
from the distinct particle collection methods and - to a minor extent - the particular aspects 
of how the procedure for removing unbound particles is implemented. 

We close with a discussion of the relevance and implications of the scatter across dif- 
ferent codes for other fields such as semi-analytical galaxy formation models, gravitational 
lensing, and observables in general. In summary we conclude that while the majority of codes 
give results with a scatter below 1 per cent (at least for the most basic halo properties) once a 
particular definition has been specified, the residual differences for more complex quantities 
such as spin parameter are more sensitive to the particular implementation of the initial parti- 
cle collection. We therefore caution any user of halo finders and their catalogues to not use the 
data blindly but to consider the mode of operation and definitions used during its generation. 

Key words: methods: A^-body simulations - galaxies; haloes - galaxies; evolution - cosmol- 
ogy: theory - dark matter 



codes and to optimise their predictive power one needs equally so- 
phisticated structure finders. 



1 INTRODUCTION 

Over the last 30 years great progress has been made in the de- 
velopment of simulation codes that model the distribution of the 
dissipationless dark matter that makes up most of the Universe's 
dynamical mass. Some codes also simultaneously follow the sub- 
stantially more complex baryonic physics of the visible and hence 
directly observable Universe. Nowadays we have a great variety 
of highly reliable, cost effective (and in some cases publicly avail- 
able) codes designed for the simulation of cosmic structure forma- 
tion (e.g. Couchman et al. 1995; Pen 1995; Gnedin 1995; Kravtsov 
et al. 1997; Fryxell et al. 2000; Bode et al. 2000; Springel et al. 
2001; Knebe et al. 2001; Teyssier 2002; O'Shea et al. 2004; Quihs 
2004; Dubinski et al. 2004; Merz et al. 2005; Springel 2005; Bagla 
& Khandai 2009; Springel 2010; Doumler & Knebe 2010). How- 
ever, producing the data is only the first step in the process; the en- 
sembles of billions of tracers generated still require interpreting so 
that their distribution may be somehow compared to the real Uni- 
verse. This necessitates access to analysis tools to map the phase- 
space which is being sampled by the tracers onto 'real' objects in 
the Universe. Therefore, to take advantage of sophisticated A^-body 
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Halo finders mine TV-body data to find locally over-dense (ei- 
ther in configuration or phase-space) gravitationally bound sys- 
tems, i.e. the dark matter haloes we currently believe surround 
galaxies. This type of analysis has led to critical insights into our 
understanding of the origin and evolution of cosmic structure and 
galaxies. Theoretically, the properties of the simulated objects are 
often reduced to readily usable functional forms, e.g. the dark mat- 
ter halo density profile, (Navarro et al. 1996, 1997; Moore et al. 
1999), concentration-mass relation (Bullock et al. 2001; Wechsler 
et al. 2002), mass accretion histories (Wechsler et al. 2002; De Lu- 
cia et al. 2004; Diemand et al. 2007; De Lucia & Blaizot 2007; 
Behroozi et al. 2012), shape distributions (Dubinski & Carlberg 
1991; Kauffmann et al. 1999; Bullock et al. 2001), clustering prop- 
erties (Mo & White 1996; Smith et al. 2003), environmental ef- 
fects (Baugh et al. 1996; Moore et al. 1996), merger rates (Lacey 
& Cole 1994; Bower et al. 2006; Fakhouri & Ma 2008; Fakhouri 
et al. 2010; Behroozi et al. 2012), disruption timescales (Ghigna 
et al. 1998; Zentner et al. 2005). All these properties and inter- 
connections have been derived from simulations, encoded using 
analytical formulae, and subsequently been used as input in, for 
instance, semi-analytical models (e.g. Cole et al. 1994, 2000; Cro- 
ton et al. 2006), gravitational lensing calculations (e.g. Kaiser & 
Squires 1993; Bartelmann et al. 1998; Diemand et al. 2008), or di- 
rectly in comparison to observations (e.g. Davis et al. 1985; Klypin 
et al. 1999; Springel et al. 2005; Komatsu et al. 201 1). And one of 
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Table 1. Chronological list of halo finders and methods since the dawn of 
computational cosmology. 



year 


method/code 


reference 


1974 


SO 


Press &Schechter (1974) 


1985 


FOF 


Davis et al. (1985) 


1991 


DENMAX 


Bertschinger & Gelb (1991) 


1994 


SO 


Lacey& Cole (1994) 


1995 


adaptive FOF 


van Kampen (1995) 


1996 


IsoDen 


Pfitzner& Salmon (1996) 


1997 


BDM 


Klypin&Holtzman(1997) 


1998 


HOP 


Eisenstein& Hut (1998) 


1999 


hierarchical FOF 


Gottloberetal. (1999) 


2001 


SKID 


Stadel (2001) 


2001 


enhanced BDM 


Bullock et al. (2001) 


2001 


SUBFIND 


Springeletal. (2001) 


2004 


MHF & MHT 


Gill et al. (2004) 


2004 


ADAPTAHOP 


Aubert et al. (2004) 


2004 


DENMAX^ 


Neyrinck et al. (2004) 


2004 


SURV 


Tormen et al. (2004) 


2005 


improved DENMAX 


Weller et al. (2005) 


2005 


VOBOZ 


Neyrinck et al. (2005) 


2006 


PSB 


Kim & Park (2006) 


2006 


6DFOF 


Diemand et al. (2006) 


2007 


further improved DENMAX 


Shaw et al. (2007) 


2009 


HSF 


Maciejewski et al. (2009) 


2009 


LANL finder 


Habib et al. (2009) 


2009 


AHF 


Knollmann & Knebe (2009) 


2010 


phop 


Skoryetal. (2010) 


2010 


ASOHF 


Planelles & Quilis (2010) 


2010 


pSO 


Sutter &Ricker (2010) 


2010 


pFOF 


Raseraetal. (2010) 


2010 


ORIGAMI 


Falck et al. (2012) 


2010 


HOT 


Ascasibar, in prep. 


2010 


ROCKSTAR 


Behroozi et al. (2013) 


2010 


Mendieta 


Sgro et al. (2010) 


2010 


enhanced SURV 


GiocoU et al. (2010) 


2011 


HBT 


Han etal. (2012) 


2011 


STF 


Elahietal. (2011) 


2012 


GRASSHOPPER 


Stadel et al., in prep. 


2012 


JUMP-D 


Casado et al., in prep. 



the questions we will address here is whether some or all of these 
relations depend sensitively upon the choice of the applied halo 
finder? 



1.1 History of Halo Finding 

While for decades the focus was on getting the simulations them- 
selves under control, it is now obvious that halo finding is equally 
important and, unfortunately, not that well understood as yet. Or to 
put it another way, producing the raw simulation data is only the 
first step in the process; the model requires reduction before it can 
be compared to the observed Universe we inhabit. In recent years, 
this field has also seen great development in the number and variety 
of object finders as shown in Table 1 , where we chronologically list 
the emergence of codes or methods. We can clearly see the increas- 
ing pace of development in the past decade reflecting the necessity 
for state-of-the-art codes: in the last ten years the number of ex- 
isting halo finding codes has practically tripled. While for a long 
time the spherical overdensity method first mentioned by Press & 
Schechter (SO, 1974) as well as the friend-of- friends algorithm in- 



troduced^ by Davis et al. (FOF, 1985) remained the standard tech- 
niques, the situation changed in the 90's when new methods were 
developed (Gelb 1992; Lacey & Cole 1994; van Kampen 1995; 
Pfitzner & Salmon 1996; Klypin & Holtzman 1997; Eisenstein & 
Hut 1998; Gottlober et al. 1999). 

While the first generation of halo finders primarily focused on 
identifying isolated field haloes the situation dramatically changed 
once it became clear that there was no such thing as 'overmerg- 
ing' : the premature destruction of haloes orbiting inside larger host 
haloes (Klypin et al. 1999) was a numerical artifact rather than a 
real physical process. Now post-processing tools face the challenge 
of finding both haloes embedded within the (more or less uniform) 
background density of the Universe as well as subhaloes orbiting 
within the density gradient of a larger host halo. The past decade 
has seen a substantial number of codes and techniques introduced 
in an attempt to cope with this problem (Stadel 2001 ; Bullock et al. 
2001; Springel et al. 2001; Aubert et al. 2004; Gifl et al. 2004; 
Weller et al. 2005; Neyrinck et al. 2005; Kim & Park 2006; Die- 
mand et al. 2006; Shaw et al. 2007; Gardner et al. 2007a,b; Ma- 
ciejewski et al. 2009; Knollmann & Knebe 2009; Planelles & Quilis 
2010; Behroozi et al. 2013; Han et al. 2012; Elahi et al. 201 1). One 
approach was to make use of the additional information available 
in a simulation where all six phase-space variables are typically 
known. Additionally, some modem finders make use of the time co- 
ordinate too, as large structures are not expected to suddenly appear 
out of nothing. The use of such extra information makes possible 
the investigation of structures beyond the traditional bound objects. 
For instance disrupted objects can be studied either by tracking the 
debris from a once known object that has been disrupted or identi- 
fying such an object as a distinct entity in six-dimensional phase- 
space (see Elahi et al., submitted). Streams of stars are of course 
a highly topical example of work relevant to near-field cosmology 
(e.g. Belokurov et al. 2006). 

Further, as simulations became much larger this also led to a 
trend towards parallel analysis tools. The simulation data had be- 
come too large to be analysed on single CPU architectures and 
hence halo finders had to be adjusted to cope with this. The re- 
cent profusion of new codes is also a reflection of the drive to 
build halo finders and associated analysis tools into the simulation 
codes themselves. Such an approach obviates the need to frequently 
save the raw simulation output, instead only requiring the storing 
of much smaller reduced catalogues of the interesting structures 
and their properties (Angulo et al. 2012). This approach of course 
founders if the analysis applied is either not robust or incomplete 
in some way, as there is no longer any ability to return to the raw 
data and reprocess it without rerunning the entire simulation. 

For the upcoming generation of trillion particle production 
simulations that represent the forefront of numerical cosmology 
the approach of storing only the reduced halo catalogues there- 
fore appears to be essential unless a dramatic storage breakthrough 
is made. With such a clear direction it is essential that any post- 
processing scheme adopted is both robust and well understood. 
This is one of the key drivers of this entire project. 



^ This is strictly speaking only true for astrophysics as the friends-of- 
friends method is widely used in molecular dynamics since the 1960's to 
find bound clusters (e.g. liquid droplets in a gas). In that field it is well 
known as the "Stillinger method" (Stillinger 1963). 
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1.2 What is a Halo? 

Although the question of what is a halo appears straightforward a 
direct answer is not immediately obvious and has been the subject 
of some previous studies already (e.g. Maccio et al. 2003; Prada 
et al. 2006; Cuesta et al. 2008; Anderhalden & Diemand 2011; 
Diemer et al. 2012). While one can argue that a halo is a 'gravi- 
tationally bound object' (cf. Knebe et al. 201 1), this still leaves the 
definition of the outer edge unresolved. While we go on to discuss 
this topic in more detail in the following Sections, we nevertheless 
consider it sufficiently important to attempt to address it here in the 
Introduction, too. 

Assuming for a moment that we agree upon the aforemen- 
tioned definition of boundness, we already face the problem that 
haloes may contain substructure: will the mass of the subhaloes 
(which certainly are also bound to the host itself) be considered 
part of the host or should they be excised from it? While the an- 
swer to this uncertainty depends on the scientific problem under in- 
vestigation (e.g. gravitational tensing studies would require the full 
mass, including substructure, whereas mass profile investigations 
would likely prefer to live without these extra peaks), one needs to 
be aware that some halo finders return halo masses including 'sub- 
masses' (e.g. AHF) while others do not (e.g. SUBFIND). 

Another - directly related and actually affected - question is 
that of the edge of the subhalo. Note that we liberally talk about the 
mass and edge at the same time as they are most commonly defined 
simultaneously via one equation (cf. Eq. (2) in Section 2.5 below), 
i.e. the mass enclosed within the radius of a halo has to be some 
multiple of a reference density (usually either the background or the 
critical density of the Universe) times the (spherical) volume de- 
fined by that radius. But as we have just asked, should sub-masses 
be included as well? It will certainly change the enclosed mass and 
hence radius of the object. Irrespective of this 'sub-mass issue', 
the edge of a halo is not a well-defined quantity. Even though the 
most commonly used working definition assumes spherical sym- 
metry and adopts some theory-driven pre-factor for the reference 
density based upon a spherical top-hat collapse, this pre-factor is 
nevertheless a loosely defined parameter for which different halo 
finders use slightly different (and possibly cosmology and redshift 
dependent) values. Additionally, it is not obvious that dark matter 
(sub-)haloes should be characterised by a spherical radius, espe- 
cially when they are subject to severe tidal distortion. Friends-of- 
Friends based finders bypass this problem by merely stating as the 
halo mass the sum of all the particle masses linked together by their 
favourite choice of linking length; for a more elaborate discussion 
about the relation between SO and EOF mass, please refer to Lukic 
et al. (2009) and More et al. (201 1). Thus the edge of EOF haloes 
are by definition non-spherical. And there are examples for halo 
finders that circumvent the conventional edge definition by linking 
the halo's boundary to the dynamics of the particles (ORIGAMI 
Ealck et al. 2012, cf. Section C15). 

Is either of these approaches a reasonable or suitable strategy? 
As we will explore in more detail later, one could also think of 
rather different definitions for the halo edge (and hence its mass) 
inspired by, for instance, a desire to truncate subhaloes at the saddle 
point of the density field or the tidal radius. 

An alternative approach to quantify the 'size' of a halo which 
avoids this problem is to use a related quantity rather than the mass: 
for instance, the peak of the rotation curve as characterised by 
Knax or the radial location of this peak by i?max. These quantities 
do indeed provide a physically-motivated scale (e.g. Ascasibar & 
Gottlober 2008). While the physical properties derived from parti- 



cles at distances beyond -Rmax might exhibit scatter and systematic 
trends arising from differ definitions of a halo's edge, the quantities 
derived from the inner regions such as -Rmax and Vixiax prove to be 
far stabler against such numerical uncertainties. Another advantage 
of using Vinax is that it is more closely related to observable prop- 
erties than the halo mass. However, the peak of the rotation curve 
is reached quite close to the centre of the halo, and its measurement 
is sensitive to numerical resolution. Being set by the central parti- 
cles, it is not very sensitive to tidal stripping, which may be seen as 
either an advantage or a disadvantage depending on the scientific 
question under study. 

In summary, this brief discussion should serve to alert users of 
halo catalogues that there are choices that need to be made before 
a halo catalogue can be produced. Different code authors naturally 
make difference choices, as often there is no 'correct' method and 
more often than not the definition adopted depends upon the prob- 
lem being addressed. In what follows we will discuss the range in 
derived properties that arises due to these different choices as well 
as addressing the question of whether or not the halo finders agree 
when applied to the same data set with a common set of assump- 
tions. 



1.3 The Workshops 

We initiated the halo finder comparison project that has brought 
together practically every expert/code developer in the field at a 
series of bi-annual workshops focusing on the comparison of their 
respective codes. 



1.3.1 Haloes going MAD 2010 

The start-up gathering and first comparison with respect to mock 
and field haloes: during the last week of May 2010 we held the 
workshop "Haloes going MAD" in Miraflores de la Sierra close 
to Madrid dedicated to the issues surrounding identifying haloes 
in cosmological simulations. Amongst other participants 15 halo 
finder representatives were present. The aim of this workshop was 
to define (and use!) a unique set of test scenarios for verifying the 
credibility and reliability of such programs. We applied each and 
every halo finder to our newly established suite of test cases and 
cross-compared the results. 

To date most halo finders were introduced (if at all) in their 
respective code papers which presented their underlying principles 
and generally subjected them to tests within a full cosmological en- 
vironment, primarily matching (sub-)halo mass functions to theo- 
retical models and fitting functions. Hence no general benchmarks 
such as the ones designed at this workshop existed prior to this 
meeting. Our newly devised suite of test cases is designed to be 
simple yet challenging enough to assist in establishing and gauging 
the credibility and functionality of all commonly employed halo 
finders. These tests include mock haloes with well defined prop- 
erties as well as a state-of-the-art cosmological simulation. They 
involve the identification of individual objects, various levels of 
substructure, and dynamically evolving systems. The cosmological 
simulation has been provided at various resolution levels with the 
best resolved containing a sufficient number of particles (1024^) 
that it can only presently be analysed in parallel. 

All the test cases and their analysis are publicly available from 
http://popia.ft.uam.es/HaloesGoingMAD under the 
tab "The Data". 
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1.3.2 Subhaloes going Notts 2012 

While "Haloes going MAD" primarily dealt with either mock halo 
set-ups containing well behaved substructure or field haloes, the 
next natural question was how halo finders perform and compare 
when it comes to subhaloes as found in high-resolution simulations. 
Within the hierarchical structure formation scenario (Davis et al. 
1985) the quantification of the amount of substructure (both obser- 
vationally and in simulations of structure formation) is an essential 
step towards what is nowadays referred to as "Near-Field Cosmol- 
ogy" (Freeman & Bland-Hawthorn 2002). We therefore utilized the 
data for one of the haloes from the Aquarius project (Springel et al. 
2008, courtesy VIRGO consortium^) that consists of multiple dark 
matter only re-simulations of a Milky Way like halo at a variety of 
resolutions performed using GADGETS (based upon GADGET2, 
Springel 2005). 

The follow-up meeting "Subhaloes going Notts" then took 
place during the second week of May 2012 in Dovedale, proba- 
bly one of the most remote locations in England. The focus of this 
meeting was to better understand the differences in (sub-)halo prop- 
erties that emerged during the analysis of the two comparison pa- 
pers Knebe et al. (2011) and Onions et al. (2012).'^ Furthermore, 
we also took the collaboration of the 9 code representatives present 
at that meeting and all other participants actively interested in halo 
finding one step further: having at our disposal the analysis and 
expertise for various codes in the field we intended to address sci- 
entific questions (as opposed to academic comparisons) using the 
'code scatter' as error bars on the results. To this extent we fo- 
cused on the development of a common post-processing pipeline. 
Further, a lot of effort during this meeting went into an improved 
understanding of where the differences between the codes came 
from. 

Again, all the test cases and their analysis are pub- 
licly available from http://popia.ft.uam.es/ 
SubhaloesGoingNotts following the instructions given 
under the tab "Data". 



1.4 Intention of this Work 

The aim of this paper is firstly to acquaint the reader with the 
general concepts commonly applied to the problem of finding ob- 
jects in simulations of cosmic structure formation. These assump- 
tions and choices underpin the production of halo catalogues that 
are then often used by other fields, for instance, semi-analytical 
galaxy formation, or - most importantly - direct observational 
comparisons. We address such questions as "what can I expect 
from halo finders?" as well as "to what accuracy can I trust these 
catalogues?". The latter is obviously of great relevance to anyone 
employing halo catalogues, especially as we have entered the era 
of precision cosmology (Smoot 2003; Primack 2005; Coles 2005; 
Primack 2007). 

In order to address such questions we first have to plunge into 
the details of halo finding. What are the various methods applied 
by the community and how do these different approaches drive any 
scatter in the derived halo properties? To this extent, parts of this 
article serve as a reference for anyone interested in the technical 
details which are discussed in Sections 2 and 5 where we describe 
the (sub-)halo finding methods and the technical issues on the way 



http://www.virgo.dur.ac.uk/ 
^ Note that the "Subhaloes going Notts" paper had been published prior to 
the workshop. 



to high precision (sub-)halo finding, respectively; Sections 3, 4, and 
6) are of particular interest to users of halo catalogues and address 
the definition of halo properties, the uncertainties in their recovery 
and their applications in other fields, respectively. A summary of 
the content in the respective Sections is given here to better guide 
the reader and allow quicker access to the information. 

2 Halo Finding Methods: We separate the actual working 
methodology of a halo finder from the subsequently applied def- 
initions for halo properties discussed in the following Section. This 
section therefore is of a technical nature with likely little interest 
to the end-user of halo catalogues. It allows for greater insight into 
the possible origin of any (dis-)similarities between the finders. But 
note that we are not discussing or presenting individual codes here, 
we rather talk about the methods in general. 

3 Definition of Halo Properties: Given an identical set of par- 
ticles belonging to a halo, there are still various possibilities for 
how to define (and hence calculate) its properties. In this Section 
we present the most commonly adopted working definitions which 
are - in principle - independent of the applied halo finder. 

4 Recovery of Halo Properties: This Section compares the re- 
sults from different halo finders applied to various identical data 
sets. While it is in part a summary of the work presented in Knebe 
et al. (2011) and Onions et al. (2012) it extends these works by 
digging deeper and quantifying the errors. 

5 Precision Cosmology: After presenting hard numbers for the 
differences between codes the questions remain about the origin of 
the scatter, possible ways to improve agreement, and the impact for 
the era of precision cosmology. These topics shall be discussed in 
this Section. 

6 Relation and Application to other Fields: While all previous 
Sections primarily dealt with cosmological simulations and aca- 
demic test cases, here we talk about the relevance of halo finding 
for other fields such as semi-analytical galaxy formation models, 
gravitational lensing, and observables in general. The focal point 
will be to gauge the significance of differences in halo finders (and 
property definitions) for the respective fields. 

Even though we clearly separated the finding methods in Sec- 
tion 2 from the property definitions in Section 3, we emphasize that 
there is a great interplay between those two parts, especially when 
it comes to the centre and velocity of the halo: both these quantities 
are essential for the procedure of removing gravitationally unbound 
particles (forming part of the methods) and hence one needs to bear 
in mind that the division is not entirely straightforward. 

The data used and results presented throughout this work are 
based upon the two earlier comparison projects "Haloes going 
MAD" and "Subhaloes going Notts". However, there are subtle dif- 
ferences to these sets as the former allowed code representatives to 
return their values for halo properties as derived from their respec- 
tive codes, whereas the latter project based the comparison on cat- 
alogues obtained via a common post-processing pipeline applied 
to the provided particle ID lists. Here we also go one step further 
using at times only those objects found by all finders and directly 
comparing the same haloes across codes. 



2 HALO FINDING METHODS 

Here we present a summary of all the steps commonly employed in 
the process of generating halo catalogues starting from the raw out- 
put of a cosmological simulation. For the general reader this may be 
a rather technical section and we therefore encourage everyone not 
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interested in 'flowcharts for halo finding' to skip to the next Sec- 
tion 3 where working definitions for halo properties will be given. 
But a lot of the points discussed here will actually be relevant and 
of importance when it comes to understanding the origin of differ- 
ences between halo catalogues obtained with different finders for 
identical data sets: the steps outlined here and realised in practice 
actually define a halo finder and distinguish it from others. 

In any case, the first two halo finders mentioned in the litera- 
ture, i.e. the spherical overdensity (SO) method (Press & Schechter 
1974) and the friends-of-friends (FOF) algorithm (Davis et al. 
1985) remain the foundation of nearly every code: they often in- 
volve at least one phase where either particles are linked together 
or (spherical) shells are grown to collect particles. While we do 
not wish to invent stereotypes or a classification scheme for halo 
finders there are unarguably two distinct groups of codes: 

• density peak locator (-1- subsequent particle collection) 

• direct particle collector 

The density peak locators - such as the classical SO method - aim 
at identifying by whatever means peaks in the matter density field. 
About these centres (spherical) shells are grown out to the point 
where the density profile drops below a certain pre-defined value 
normally derived from a spherical top-hat collapse. Most of the 
methods utilising this approach merely differ in the way they locate 
density peaks. The direct particle collector codes ~ above all the 
FOF method - connect and link particles together that are close to 
each other (either in a 3D configuration or in 6D phase-space). They 
afterwards determine the centre of this mass aggregation. Please 
note that there is a subtle difference between codes utilizing a hy- 
brid approach, i.e. a sole SO finder will be different from a finder 
that first applies a FOF method and then crops the halo by means 
of SO. 

For a brief technical presentation of all the finders participat- 
ing in the comparison project in one way or the other, we refer the 
reader to Appendix C where their mode of operation is presented. 

2.1 Candidate Identification 

The first step for nearly all non-FOF based halo finders is to gener- 
ate a list of potential halo centres. In most cases, and in particular 
for SO based finders, this is achieved by locating peaks in the den- 
sity field or troughs in the gravitational potential field. Some tech- 
niques such as phase-space (ROCKSTAR) or velocity based finders 
(STF) may have their very own approach, but almost any finder 
comes up with such an initial list which is then processed further. 

A related issue, which is often the last step of any algorithm, 
would be the prescription followed to decide which of the objects 
found are indeed real and which ones are spurious. This may be 
as simple as a threshold on the particle number, but more elabo- 
rate statistical criteria, often specifically tuned for each particular 
technique, are also implemented by several codes. This 'catalogue 
cleaning' process is in practice a problem area as it may be diffi- 
cult to implement in parallel for the largest datasets. The issue is 
that particles from rejected haloes may need to be re-added to ei- 
ther another halo or the background pool in a self-consistent way. 
Thus whether or not the cleaning process is carried out 'on-the-fly' 
as the halo catalogue is built up or as a separate post-processing 
step can make a difference to the finally generated catalogue. We 
would advocate that the final catalogue generated after any halo 
cleaning algorithm has been applied should be independent of the 
location of the halo cleaning in the analysis chain. Unfortunately 
this is presently not always the case. 



2.2 Particle Collection 

Once a candidate list has been generated, one needs to gather those 
particles that likely belong to each and every object. In practice, 
there is again a lot of room for variety in how to achieve this, and 
we will see later on that it may have an influence on the actual halo 
properties. For instance, when dealing with simulations containing 
substructure (like the Aquarius data used during the "Subhaloes go- 
ing Notts" workshop), one question is whether or not the particles 
belonging to a subhalo should also be affiliated to the host halo. 
This essentially boils down to a decision on whether or not any sin- 
gle particle can be in more than one object at the same time. Further, 
haloes will also be affected by either collecting particles in spher- 
ical regions, from arbitrary geometries, or in phase/velocity-space. 
All these issues will leave their imprint on the final halo catalogue. 



2.3 Halo Centre & Built Velocity Determination 

Once a candidate particle list for each halo has been obtained it is 
important to locate its centre as this defines the physical location of 
the object as well as being used within many subsequent analyses. 
There are a variety of possibilities and implementations imagin- 
able for the identification of the centre (see Section 3.1). For in- 
stance, some codes simply stick to the location of the density peak 
or gravitational potential minimum used during the candidate iden- 
tification (e.g. AHF) whereas others use the centre of mass of some 
fraction of the particles. And in the case of extended or stream like 
structures the halo centre can be quite ill defined. In that regards, 
iterative refinement techniques for a robust centre may need to be 
implemented also impacting upon the particle collection discussed 
before. 

Similar issues arise when calculating the velocity of the object, 
which could be calculated from all the particles or some central 
subset or by simply taking the velocity of the most bound particle 
for example. An added complication is any unbinding procedure, 
which may lead to a need to iteratively recalculate the centre and 
bulk velocity as unbound particles are removed. Some codes deter- 
mine the position of the centre first, and then use that information 
to determine the velocity, while others find both at the same time. 
All this will subtly affect the decision of whether a particle is con- 
sidered bound or not as it is the particle velocity relative to the bulk 
velocity that matters for unbinding. 



2.4 Unbinding Procedure 

We have just seen that the centre and bulk velocity determination 
may actually form part of any unbinding procedure during which 
gravitationally unbound particles are iteratively removed. These 
two steps are therefore not necessarily separate tasks. Furthermore, 
the scheme for collecting particles touched upon in Section 2.2 is 
certainly not fully disconnected from the unbinding process either: 
while some codes prefer to adhere to a conservative initial particle 
collection others rely on the fact that a stringent unbinding proce- 
dure will remove any incorrectly collected particles again. Adher- 
ents to this approach point out that if a particle is not included in 
the initial candidate list it can never be added back in later. 

Other obvious differences come from the calculation of the po- 
tential (j> entering the calculation of the escape velocity Wcsc = -\/20 
(against which each particle's velocity is compared), the order 
which particles are removed, and the termination criterion for the 
imbinding process itself. Most of the codes discussed in Onions 
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et al. (2012) calculate 4> using a tree, whereas others make simpli- 
fying assumptions such as spherical symmetry (AHF) or detailed 
surmises about the radial density profile (H0T3D & H0T6D). 
Some codes remove one particle at a time, always using the least 
bound one (GRASSHOPPER), whereas others remove every par- 
ticle considered unbound in one go before re-iterating, and yet oth- 
ers restart the iteration when a certain fraction of the particles have 
been removed. Finally there are various termination criteria for the 
iterations: no more particles are removed, only a negligible fraction 
of the particles have been removed, etc. 

It should be noted that presently some codes do not feature an 
unbinding procedure. The necessity for unbinding is tightly linked 
to the particle collection method. Configuration-space finders will 
always include some dynamically unrelated particles with high rel- 
ative velocities (see, for instance Onions et al. 2013). Consequently, 
we argue that all configuration-space based finders, regardless of 
how conservative the initial particle collection is, require an un- 
binding step to remove false positives. In practice, the addition of 
an unbinding process essentially converts any configuration-space 
based finder into a mock phase-space finder, as unbound particles 
are dynamically unrelated and will be some distance away from 
the object in phase-space. The nature of unbound particles means 
that phase-space finders are more reliable when it comes to picking 
bound particles in the first place. However, we caution that unbind- 
ing is not actually a physically motivated method of pruning the 
particle list (Behroozi et al. 2012). Particles can become marginally 
unbound momentarily, for instance when a subhalo passes through 
dense regions of its host halo, and later become bound. Pruning a 
particle list based on a particle's instantaneous binding energy will 
not return the mass that is dynamically associated with an object. 

The need for an unbinding procedure depends not only upon 
the algorithm but the problem being addressed. If the parameter be- 
ing quantified is not sensitive to a small fraction of interlopers (such 
as the total mass for field haloes) then the errors are likely to be 
small. However, as we show, some properties (such as halo spin) are 
highly sensitive to the presence of unbound particles and for such 
measures an unbinding procedure is essential. Additionally, halo 
catalogues produced by configuration-space based finders without 
an unbinding procedure suffer from a significant amount of con- 
tamination from spurious small objects. Consequently, the number 
of particles required within an object before it can be trusted is cor- 
respondingly much higher than for similar finders with an unbind- 
ing process. For this reason alone it makes sense to always utilize 
an unbinding procedure unless the inclusion of unbound material is 
specifically desired, for instance if studying diffuse streams, tidal 
relics, or gravitational lensing. 

Finally we would like to state that there are also similarities 
in the unbinding procedures adopted by all codes: we all consider 
the object in isolation (i.e. not embedded within an inhomogeneous 
background, be that a host halo or the surrounding universe itself) 
and we all agree that considering the Hubble flow has little if any 
impact (and hence the Hubble flow is not taken into account in 
some codes) . 



2.5 Mass & Edge Determination 

Once a set of (gravitationally bound) particles has been found for 
each halo, one of the most nebulous steps arises: how to find the 
halo edge, a quantity which will in turn also determine its mass (see, 
for instance, Maccio et al. 2003; Prada et al. 2006; Cuesta et al. 
2008; Anderhalden & Diemand 2011; Diemer et al. 2012). This is 
an important procedure because for many purposes we require a 



rank ordering of our objects, whether this be by mass or size, and 
then subsequently attempt to find conversion relations between one 
property and another based on this ranking. 

This topic is closely related to the aforementioned question 
"what is a halo?" (cf. Section 1). The answer is not as straight- 
forward as one might hope and depends on the halo finder and 
the scientific questions in mind. For instance, in studies of gravita- 
tional lensing, one certainly needs to include the substructure in the 
host halo's mass; for (stellar) stream investigations one is actually 
more interested in the unbound rather than the bound particles; for 
shape distributions and correlations with environment spherically 
cut haloes are not the best choice, etc. To add to confusion already 
created with all the ambiguity arising from the steps presented be- 
fore, let us quote here several statements from the lively discussion 
about this subject at the "Subhaloes going Notts" workshop: 

• the halo edge is the distance to the farthest bound particle 

• the halo edge is defined via the spherical top-hat collapse 
model 

• the halo edge is the 'zero-velocity' radius 

• the halo edge is defined by the outer 3D caustic in the trans- 
formation from Lagrangian to Eulerian coordinates 

• as a large region of the universe is bound to every object, we 
should simply use the first isodensity contour that goes through a 
saddle point 

• an object should be defined dynamically: whatever particles 
stay with the object over several dynamical times are part of it 

• do not try to define an edge, provide best-fit parameters to 
some function describing the density profile of each object 

• do not try to define an edge, just provide the (bound) particle 
lists to the user 

It will be up to the user and the actual scientific problem at 
hand to decide which definition serves best. But note that most 
finders adhere to some form of Eq. (2) (see Section 3.2 below). One 
noteworthy exception to this rule though is ORIGAMI (Falck et al. 
2012) that uses the outer caustic approach to both collect particles 
and assign an edge to them (cf. Section CI 5). 

2.6 Tracking of Haloes 

Finding objects in simulations is not necessarily a task that is only 
limited to a single time snapshot. On the contrary, in most cases 
we are actually interested in the temporal evolution of our haloes. 
While this could be achieved by tracking objects between multi- 
ple halo catalogues separated in time, one may also think of basing 
the halo finder upon this approach, as has recently been done for 
HBT (Han et al. 2012) or for MHT (Gill et al. 2004) and SURV 
(Tormen et al. 2004; Giocoli et al. 2008, 2010). A sophisticated 
tracking algorithm may in fact improve the accuracy and credibil- 
ity of halo finders: one can use the tracking results to adjust the 
halo catalogues and remove spurious identification and/or recover 
missing objects (Springel et al. 2001; Gill et al. 2004; Tormen et al. 
2004; Giocoli et al. 2008; Tweed et al. 2009; Giocoli et al. 2010; 
Behroozi et al. 201 1; Benson et al. 2012). 

Note that most of the aforementioned papers are concerned 
with the proper tracking of subhaloes; they all, with the exception 
of Behroozi etal. (2011), devised methods to follow subhaloes after 
infall into their host. Behroozi et al. (201 1), however, extended this 
idea to halo catalogues in general: large, established haloes should 
not be expected to suddenly appear or vanish and the location of 
the halo centre and bulk velocity should not change by unphysical 
amounts between any two outputs. 
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2.7 Treatment of Baryons 

The treatment of baryons is something that is becoming more and 
more important given the fact that the simulations are routinely in- 
cluding them these days. It is well established that baryonic physics 
alters the particulars of dark matter haloes and subhalo populations 
(Blumenthal et al. 1986; Tissera & Dominguez-Tenreiro 1998; 
Maccio et al. 2006; Oiiorbe et al. 2007; Romano-Diaz et al. 2008, 
2009; Libeskind et al. 2010; Duffy et al. 2010; Sommer-Larsen 
& Limousin 2010; Romano-Diaz et al. 2010; Schewtschenko & 
Maccio 2011; di Cintio et al. 2011; Govemato et al. 2012; Zolotov 
et al. 2012; Brooks & Zolotov 2012; Di Cintio et al. 2012; Zemp 
et al. 2012), there remains the question of how this will influence 
the performance of an object finder. Additionally the gas particles 
themselves carry not only kinetic but also thermal energy giving 
rise to thermal pressure in the medium (specified by the adopted 
equation of state). At present, halo finders deal with these sub- 
tleties differently, with some accounting for the gas' thermal energy 
u= I ^T and others ignoring it. Finders like AHF or SUBFIND 
add u to the total specific energy egas (required to be negative for 
bound particles): 



1 2 



(1) 



where <j) the gravitational potential and v the gas particle's veloc- 
ity. A different approach used by some codes (e.g. JUMP-D) is to 
remove all the hot gas from a two-phase medium by making use of 
its bi-modal energy distribution. 

In a recent study (Knebe et al. 2013) we compared a set of 
finders when applied to a simulation that not only models gravity, 
but simultaneously follows the evolution of the baryonic material 
by incorporating a self-consistent solution to the hydrodynamics; 
the simulation further included a model for star formation and stel- 
lar feedback. We found that the diffuse gas content of the haloes 
shows great disparity, especially for low-mass satellite galaxies. We 
nevertheless acknowledged that the handling of gas in halo finders 
is something that needs to be dealt with carefully, and the precise 
treatment may depend sensitively upon the scientific problem be- 
ing studied. We therefore refrain from any in-depth discussions of 
this subject here and only will present a key plot in Section 4.3; the 
details can be found in Knebe et al. (2013). 



2.8 Summary 

We have seen that halo finding is not as simple as passing the raw 
simulation data through some filter (may that be velocity or posi- 
tion filtering). It involves several steps starting from initially gen- 
erating a putative list of halo candidates to eventually locating an 
edge for an object possibly defined by them. Presenting the detailed 
implementation of each of these steps in every single halo finder 
is beyond the scope of this article. We nevertheless provide in Ap- 
pendix C a brief descriptions of all the codes that participated in one 
way or the other in the comparison project; please refer to the refer- 
ences therein for more details. But we would also like to highlight 
that there is no unique implementation: each code applies its own 
way of realising the necessary steps for going from the raw simu- 
lation data to the final halo catalogue. In fact, one cannot come up 
with a unique candidate identification or particle collection method 
as these parts clearly define and characterize a halo finder. For in- 
stance, phase-space finders usually base their particle collection on 
an intrinsically different algorithm than configuration-space find- 
ers (but see e.g. HOT). It should be noted that FOF based find- 



ers combine several of the steps outlined here due to their intrinsic 
simplicity: their candidate identification and particle selection is in 
practice just one step; they further may also not apply an edge defi- 
nition other than the one given by the isodensity contour defined via 
the applied linking length and they do not intrinsically involve any 
unbinding step. And one should not forget that most of the meth- 
ods outlined here are linked to each other. For instance, adhering 
to a certain edge definition method will lead to a code that upfront 
collects its particles in a way tailored to suite that definition. For 
example, a code aiming at collecting out to the zero-velocity radius 
certainly collects particles differently than a code using the first 
shell crossing approach. But the particle collection will influence 
the unbinding procedure as we will have different centres and bulk 
velocities to start with. 

We will see below that this 'freedom of realisation' will lead to 
unavoidable scatter when recovering halo properties with different 
finders. Although there are of course many possible variations, the 
steps outlined here underlie the architecture of almost every halo 
finder, and each of them will introduce some scatter in the phys- 
ical properties of the haloes returned by the different algorithms. 
For end users, it is important to know the magnitude of the scatter 
associated with the most important properties of the haloes (i.e. po- 
sition, velocity, and mass), to be quantified in Section 4. Advanced 
users - and, above all, developers - will also be interested in the 
amount of scatter due to the particular implementation of the dif- 
ferent steps, which will be investigated in Section 5 using the mass 
function of dark matter subhaloes as a reference test case. 

It is important to note, though, that this scatter should not be 
confused with the discrepancies arising from the different defini- 
tions of the same quantity that may be adopted by any given algo- 
rithm. These are discussed in more detail in the following Section. 
However, as we will also see in Section 4 and Section 5 there are 
ways to unify the post-processing once an initial set of particles has 
been gathered and added to the list of putative halo centres. 



3 DEFINITION OF HALO PROPERTIES 

Even if one had a perfect, uncontaminated set of particles associ- 
ated with an object of interest representing the dark matter halo of 
a galaxy or a galaxy cluster, or maybe a stream of tidal debris mate- 
rial, it would still remain unclear how to define its physical proper- 
ties. The most relevant and fundamental are probably the position, 
mass, radius, and bulk velocity of the object. All other properties 
(e.g. l/max, spin parameter, shape, velocity dispersion, concentra- 
tion, etc.) are actually derived properties that mostly require the 
determination of the position and bulk velocity in order to place the 
associated particles into the rest frame of the halo. 

This is an important Section even for end users of halo find- 
ers, as not every code uses the same definitions for extracting halo 
properties. This leads to different, yet still internally correct, re- 
sults. The general user should be aware that the adopted definitions 
will have a significant impact on the final halo catalogues. 



3.1 Centre Position & Bulk Velocity 

Most finders use the peak of the local (phase-space) density field to 
define the centre of a halo. Its spatial location and bulk velocity are 
determined by either the (weighted) average over all the (bound) 
particles in the object or only a certain 'central' fraction of them. 
The chosen fraction, as well as the criterion to define 'central' (e.g. 
spatial and/or velocity distance, binding energy, etc) are specific to 
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each halo finder. Usually - but not always - the same prescription 
is applied to field haloes and subhaloes. 

The position and velocity of the centre play an important role 
in several of the steps described in Section 2, as well as on some of 
the other properties of the halo (see Section 4): differences of the 
order of ten per cent are expected when using different reference 
frames (Ascasibar & Gottlober 2008; Han et al. 2012). 



3.2 Mass & Edge deflnition 

In Section 2.5 we have raised the more general problem of deter- 
mining an object's extent once the problem of associating particles 
to it has been accomplished. Different codes might use different ap- 
proaches of which some had been sketched in the bullet-item list in 
that Sub-Section. Here we simply like to go one step further high- 
lighting that even adhering to one of these methods, e.g. the com- 
monly used virial edge definition via the spherical top-hat collapse, 
might lead to degeneracies in halo mass. 

While most FOF-based finders simply report the cumulative 
mass of all linked particles and derive the radius of a spherical re- 
gion equivalent to the total extent of the halo, other finders use the 
following working definition for both halo mass and radius: 






— Arcf X pr< 



(2) 



where Arcf is a parameter (usually determined from the spherical 
top-hat collapse of an overdense region in an expanding universe, 
and hence a function of the cosmological parameters Qm and ^Ia, 
as well as the redshift z; see, for instance, Courtin et al. 2011) 
and Pref is a reference density (normally either the critical density 
Pcrit = 3H^ /StvG or the background density pt = flm x pcrit). 
The 'freedom' in choosing these two parameters already hints at 
the possible ambiguities in the location of the halo edge (and there- 
fore mass). It must be stressed, though, that there is no right or 
wrong way; users of halo finder catalogues just need to be aware 
that several alternative definitions exist and which one of these has 
been used. For a relation between SO and FOF masses as defined 
above, the reader is referred to Lukic et al. (2009) and More et al. 
(201 1); and for a thorough discussion of different choices for Aref 
and prcf please refer to, for instance, Maughan et al. (2006) and 
the Appendix in Sembolini et al. (2012), respectively. But also note 
that we are not entering the discussion here about the applicabil- 
ity of such a defintion and its implications for the redshift evolu- 
tion of halo mass. We just like to state that Eq. (2) may not be the 
appropriate choice in the end as it leads to spurious (and unphys- 
ical) evolution as, for instance, shown and discussed by Diemand 
et al. (2007), Diemer et al. (2012), and Kravtsov & Borgani (2012). 
To avoid such ambiguities it might therefore be more meaningful 
to use intrinsic scales to characterize and quantify the mass and 
size of haloes such as, for instance, Vmax and i?max (Ascasibar & 
Gottlober 2008; Rnebe et al. 201 1) - as already advocated before 
in Section 1.2. 

Analogously, not all finders consider the mass of a subhalo to 
be part of the mass of the host halo. Which definition is to be used 
depends on the scientific problem at hand, but the end user needs to 
be aware of what the code returns. The mass of the subhaloes them- 
selves depends on how their particles are collected. Again, some 
codes identify a spherical 'tidal radius', whereas others use isoden- 
sity contours (or other prescriptions) to define the subhalo edge. 



3.3 Derived Properties 

The position, velocity, mass, and radius of a halo are the basic prop- 
erties to be returned by any halo finder. However, most codes pro- 
vide additional information (e.g. Vmax, spin parameter, concentra- 
tion, shape, etc.). We will discuss here some of the quantities that 
we consider especially relevant for a large number of users of halo 
catalogues. The relation to several particular fields is explored in 
more detail in Section 6 below. Note that, since most of these prop- 
erties depend on the reference frame of the halo, their actual values 
may be subtly dependent on the adopted prescriptions. 



3.3.1 Rotation Curve 

As already mentioned in Section 1.2, the peak of the circular rota- 
tion curve 



Vmax = max 



GM{< r) 



(3) 



may be a more physically meaningful and stable measure of halo 
mass than any value based upon an ambiguous edge definition (es- 
pecially for subhaloes). Moreover, this quantity is much closer to 
the observational data, as it is possible to measure rotation curves 
(and hence Knax), whereas all the ambiguities related to the outer 
edge of a galaxy apply to observations as well. 

On the other hand, measuring Vmax requires sufficient resolu- 
tion to determine the circular velocity accurately enough: the peak 
position -Rmax will always be reached relatively close to the cen- 
tre of the object. In addition, it should be mentioned that the actual 
procedure for the determination of Vmax could be considered to be 
part of the 'methods' of the halo finder: some codes directly use 
the list of particles sorted in distance from the halo centre, with or 
without smoothing, and locate the peak via some form of interpo- 
lation; other codes prefer to bin M(< r) prior to the peak deter- 
mination. These choices again introduce subtle differences in the 
derived quantity. Further, while Vmax might be easily determined, 
flmax is more ambiguous as the velocity profile can be quite flat. 

The main message is that, due to the noise inherent to the mass 
profile, the practical definition of Vmax and iimax implemented in 
each halo finder is not as simple as 'the maximum of the rotation 
curve'. The scatter due to the different definitions/methods will be 
discussed below in Section 4. 



3.3.2 Spin 

There are two commonly used definitions for the spin parameter of 
a halo 



L J\E 



Xb = 



L 

V2MVR 



(4) 



where 



L = N^TniTi X Vi 



(5) 



is the angular momentum vector of all N particles in the halo, Af is 
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the virial mass enclosed at a virial radius R,V = \JGM/ R the cir- 
cular velocity at R, and E its total energy. The former is the classi- 
cal definition originally introduced by Peebles (1969, subscript P) 
and the latter a simplification of it first introduced by Bullock et al. 
(2001, subscript B) (which reduces to the standard \p when mea- 
sured at the virial radius of a truncated singular isothermal halo). 

The spin parameter can be seen to be a measure of the amount 
of coherent rotation in a system compared to random motions. For 
a spherical object, it is approximately the ratio of its own angular 
velocity to the angular velocity needed for it to be supported against 
gravity solely by rotation (see e.g. Padmanabhan 1993). A detailed 
account of the merits and drawbacks of the two alternative defini- 
tions of the spin parameter is provided in (Hetznecker & Burkert 
2006). 



3.3.3 Shape 

Having identified the set of particles belonging to (and defining the 
shape of) an object, it is common practice to compute their moment 
of inertia tensor Ijk- For a distribution of discrete point masses, Ijk 
is expressed as 

JV 

Ijk ^'^mi{r'tSjk - XijXik) with j, fc = {1;2;3} , (6) 

i = l 

where rrii is the mass of particle i, N the number of particles and 
n = ^x^i + x^2 + ^i3 is the distance of particle i from the centre 
of mass of the particles. The eigenvalues and eigenvectors of Ijk 
are related to the axis ratios and orientation, respectively, of the 
ellipsoid that fits best the particle distribution. A similar ellipsoid 
can be obtained from the tensor 



Mjk = ^ rriiXijXik , 



(7) 



which has been widely used in previous studies. Both forms provide 
axis ratios and orientations that are identical (though the individual 
eigenvalues are different). 

Note that the simplest method determines the shape and ori- 
entation of a halo using all particles within a spherical volume or 
shell at a given radius (e.g. Frenk et al. 1988; Kasun & Evrard 2005; 
Hopkins et al. 2005; Bailin & Steinmetz 2005). While this method 
robustly recovers the orientation of the halo, the resulting axis ra- 
tios tend to be biased towards larger values (i.e. haloes are predicted 
to be rounder). 

An alternative, iterative approach to the problem consists in 
using all the particles within a spherical volume or shell, but the 
initial surface is deformed along the principal axes of the best- 
fitting ellipsoid, and the process is repeated until convergence is 
reached (e.g. Katz 1991; Dubinski & Carlberg 1991; Warren et al. 
1992; Bullock 2002; AUgood et al. 2006; Maccio et al. 2006; Vera- 
Ciro et al. 201 1). Both Jing & Suto (2002) and Bailin & Steinmetz 
(2005) have noted that iterative methods have difficulty in achiev- 
ing convergence in simulations in which haloes are very well re- 
solved and contain a population of satellites. Satellites tend to lead 
to a distortion of the shape, and this is most pronounced in the out- 
ermost parts of host haloes, where recently accreted satellites are 
most likely to be found. 

The impact of substructure can be reduced by working with 
the reduced inertia tensor 



M 



]k 



^ = E 



fTT^i^ij^ik 



(8) 



where each particle is weighted by the inverse square of its distance 
to the centre of the halo. While this recovers accurately the orien- 
tation of the ellipsoid, the axis ratios are systematically overesti- 
mated, and thus haloes are predicted to be more spherical than they 
actually are (e.g. Bailin & Steinmetz 2004). For a more elaborate 
discussion of all these possibilities we refer the reader to a recent 
study by Zemp et al. (201 1); here we would only like to reiterate 
that there is more than one definition of halo shape. 



3.4 Summary 

Not only will halo finders vary in the method used to determine cer- 
tain properties, such as position, mass, and bulk velocity (as cov- 
ered in Section 2); they may also use different definitions as dis- 
cussed here. While the precise way to gather the particles belong- 
ing to an object is indeed the essence of the halo finder, the exact 
definition of its physical properties could in principle be passed on 
to the end user by supplying just the associated particle lists and/or 
physically-motivated fits to the particle distribution. Arguably, this 
would impose an unnecessary burden on the user, and it is normally 
considered much more convenient that the halo finder returns actual 
numeric values for halo properties, according to any particular def- 
inition of its choice (that should hopefully be explicitly stated in 
the halo finder documentation). It is the responsibility of the user to 
understand those definitions and use them consistently when com- 
paring to other numerical, observational, or analytical work. Con- 
version factors or more elaborate recipes may need to be applied to 
switch from one definition to another and have been the subject of 
various investigations in the literature. 



4 RECOVERY OF HALO PROPERTIES 

In this section we address the following question: do halo finders 
(dis-)agree when applied to identical data sets? More precisely, we 
would like to discuss the scatter in the fundamental properties re- 
turned by any halo finder (i.e. position, velocity, and mass of each 
object) as well as in some of the most popular derived quantities, 
such as the maximum of the circular velocity, halo shapes, spin pa- 
rameter, and halo number counts as a function of mass or circular 
velocity. Much in the spirit of previous comparison papers (Knebe 
et al. 201 1; Onions et al. 2012), from which several analyses will 
be borrowed and extended, we will use the scatter of the values re- 
covered by different codes as a first attempt to quantify the uncer- 
tainties that are nowadays associated to the process of halo finding. 
The origin of such scatter, and possible ways to reduce it, will be 
discussed in Section 5. 

For this project we have utilised pre-existing datasets from a 
range of sources. Table 2 summarises which datasets are used in 
each of the following subsections. 

4.1 Field Haloes 

We begin by discussing field haloes extracted by the finders from 
the MareNostrum simulation (Gottloeber et al. 2006) at a range of 
resolutions and previously discussed by Knebe et al. (201 1). This is 
perhaps the easiest scenario for catalogue generation as at this nu- 
merical resolution there is effectively little substructure and the vast 
majority of the haloes found are isolated. In this case the choices 
made and discussed above are not as crucial as we shall see later 
and the different finders generally agree well even if no common 
post-processing pipeline is employed and we just take the mass 
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Table 2. A recap of the data used for the distinct Sub-Sections. 



Sub-Section 


data set and its particulars 


4.1 -Field Haloes 


- MareNostrum simulation (Gottloeber et al. 2006; Knebe et al. 201 1) 

— each halo finder returned its own analysis 


4.2 - Sub-Haloes 


~ Aquarius A-4 (Springel et al. 2008; Onions et al. 2012) 
— common post-processing of supplied particle ID lists 



4.2.2 - 4.2.5 - Error Quantification 



■ Aquarius A-4 (Springel et al. 2008; Onions et al. 2012) 

■ common post-processing of supplied particle ID lists 
sub-set of halo finders featuring reliable unbinding 
sub-set of subhaloes commonly found by all finders 
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Figure 1. Upper left panel: the cumulative mass (M200c) function. The arrows indicate the 50 particle limit for the 1024"^ (left), 512'' (middle), and 256'^ 
(right) simulation data. The thin black lines crossing the whole plot coiTesponds to the mass function as determined by Warren et al. (2006, (solid)) and Tinker 
et al. (2008, (dashed)). The eiTor bars represent the mean mass function of the codes (it Icr). Lower left panel: the fractional difference of the mean and code 
halo mass functions. Upper right panel: Cumulative number count of haloes above the indicated Vmax value. Lower right panel: the relative offset from the 
mean of the cumulative count. The pair of solid lines in each of the residual plots simply indicates the 10 per cent error bars. Note that both properties (i.e. 
mass and Knax) have been determined individually by each code. 



and velocity values returned directly by each group. Even the lack 
of any unbinding procedure in some codes has little impact as for 
a general halo this removes very few particles as the haloes them- 
selves are the background. 

Fig. 1 shows in the upper panels the cumulative mass^ M200C 
(left) and Vmax (right) functions alongside the mean and l-cr stan- 
dard variation for a selection of mass/Vl„ax points as error bars. The 
lower panels show the scatter of each halo finder about those mean 
values. Note that these plots are showing results at various mass 
resolution levels (i.e. 1024"', 512^, and 256"^ particles, see Knebe 
et al. 201 1, for more details) as not all finders have the capability to 
analyse the largest data set; the vertical arrows in the mass function 
indicate the 50 particle limit for the respective resolution. The two 



^ Defined by using A^cf = 200 and p^cf = Pcrit for Eq. (2) 
© 2010 RAS, MNRAS 000, 1^2 



thin lines in the upper left mass function panel represent two analyt- 
ical mass functions based upon fits to the numerical mass functions 
found in cosmological simulations: Warren et al. (2006), who use a 
FOF-based finder for their best-fit model, and Tinker et al. (2008), 
who applied an SO-finder. 

We find that the scatter in mass is at the 10 per cent level and 
actually within the limits given by the two analytical functions. This 
scatter is driven by a variety of sources and is due to the differ- 
ent choices made by the finders. In particular the finders differ on 
whether or not they include the mass of any substructures in the 
halo mass, whether or not they do unbinding and the precise defi- 
nition both the outer edge and halo centre. We note that the differ- 
ences in Vnax are - in the case that each code uses its own method 
to determine i?max and Knax - substantially larger than we shall 
see later and of order 20 — 30 per cent. 

This level of accuracy may be perfectly acceptable for some 
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measurements and indeed within any particular code where the as- 
sumptions employed are conserved convergence would be expected 
at a much higher level. However, these errors should be indicative 
of the level of accuracy at which we can absolutely measure the 
cumulative halo mass function within a cosmological model given 
that, as we stressed earlier, all of the range of assumptions adopted 
here are perfectly physically acceptable and it is hard to argue one 
set is any better than another. 



4.2 Sub-Haloes 

For the rest of this section we will employ a more challenging and 
realistic dataset to answer the question: how well could we expect 
to do if we force the finders to use a common set of definitions? 
The Aquarius simulations (Springel et al. 2008) are a set of Milky 
Way sized haloes studied at a range of resolutions. We have pro- 
cessed the A-4 dataset using a wide range of substructure finders 
and compared the results in Onions et al. (2012). This is a more 
difficult problem as now a single host halo contains several thou- 
sand subhaloes and it is the properties of these we wish to compare. 
Using the same ordering of panels as for Fig. 1, we show in Fig. 2 
the subhaloes' mass M200C and l^ax functions. In contrast to the 
field halo comparison, the mass and Knax were calculated using a 
common post-processing pipeline, i.e. only the candidate identifi- 
cation, particle collection, and unbinding procedure were different 
(cf. Section 2). 

We find that for this more complex problem a successful im- 
plementation of unbinding is essential in order to obtain reliable 
number counts anywhere near the resolution threshold. Note that 
the two finders AdaptaHOP and Mendieta do not feature a (re- 
liable) unbinding procedure: AdaptaHOP (without any unbind- 
ing) finds far too many small objects; MENDIETA does not con- 
tain a reliable unbinding procedure and hence finds too few objects 
across a large range in mass. If we were to use a common unbind- 
ing scheme for both their pre-unbinding datasets their results then 
agree with the majority of the finders. For these reasons we drop 
both AdaptaHOP and Mendieta results from the rest of this 
discussion. 

Neglecting the results from these two finders we see in Fig. 2 
that the scatter for the cumulative mass function is roughly similar 
to the field halo case studied in Fig. 1 despite the fact we are now 
using a common post-processing routine. However, the scatter in 
the V^ax function is considerably less than in Fig. I (for subhaloes 
composed of more than 100 particles). Both of these results are 
to be expected: the mass of a subhalo is sensitive to both the par- 
ticle collection scheme and the unbinding procedure whereas the 
maximum circular velocity is less sensitive to these assumptions 
as this quantity only depends on a small fraction of the most cen- 
tral particles. It is however sensitive to the particular definition and 
implementation used to extract Vinax- 

We would like to close with a cautionary remark about the 
relative residual curves presented in the lower panels of Fig. 1 and 
Fig. 2: these ratios are actually measuring the difference in the num- 
ber of objects found by a certain halo finder above a given mass 
or Vmax threshold, respectively. They do not directly measure the 
differences in mass or Vma.^- In that regard, there are two errors 
entering into these residuals: variations in the number of identified 
haloes (above a threshold) and differences in the recovered mass of 
the same object between finders. The following Sub-Section will 
now focus on the latter effect, quantifying the scatter across finders 
for the same object. 



Table 3. Total number of subhaloes and the ones found in excess of the 
common set of 823 objects of the Aquarius A-4 data set. All subhaloes are 
requested to contain 20 or more particles and have their centres within a 
sphere of radius 250h~^ kpc from the fiducial centre. Fig. 8 indicates that 
the majority of these missing objects are small, containing less than 200 
particles. 



code 


total number of objects 


'excess' objects 


ahf 


1556 


776 


HBT 


1544 


721 


H0T3D 


1254 


442 


HOT6D 


1075 


252 


HSF 


1544 


721 


ROCKSTAR 


1707 


884 


STF 


1521 


698 


SUBFIND 


1433 


610 


VOBOZ 


1863 


1040 



4.2.1 A Common Set of Objects 

Before quantifying the actual deviations between various proper- 
ties, we want to define a set of objects that could be used for this 
purpose. Note that distribution functions will not only suffer from 
differences in individual halo properties but also encode the fact 
that some finders may have identified different numbers of objects. 
To circumvent this we aim at directly comparing quantities on a 
halo-to-halo basis and move on from general distribution functions 
and their variations as discussed above. 

To cross-identify objects we use a halo matching technique 
that correlates all haloes found by a given halo finder to the cata- 
logue of another finder by examining the particle ID lists and max- 
imizing the merit function C = N'^ii^icAK^i^^)^ where A'^sharcd 
is the number of particles shared by two objects, and TVi and N2 
are the number of particles in each object, respectively (see e.g. 
Klimentowski et al. 2010; Libeskind et al. 2010, for more details, 
as well as Appendix B for different merit functions). By restricting 
ourselves to the set of objects found by every halo finder we are 
able to directly compare the properties of the same object across 
all finders. We will discuss the excess objects in Section 5.1.1 and 
caution reader that this common set can be dictated by one finder 
not finding a sufficient number of haloes in the first place. 

Please note that for this analysis we also only used the "Sub- 
haloes going Notts" data set; this project featured a common post- 
processing pipeline based upon individual particle ID lists. These 
lists make subhalo cross-matching as outlined above feasible. In 
order to avoid any possible ambiguities with the exact definition of 
position, bulk velocity, mass, and Knax calculation implemented 
by every algorithm, participants were asked to return only the lists 
of those particles that they consider bound/belonging to each ob- 
ject; the centre, bulk velocity, edge/mass, as well as various de- 
rived quantities were then calculated by a common post-processing 
pipeline, i.e. positions are iteratively determined centre-of-masses 
using the innermost 50 per cent of particles, the bulk velocity is the 
mean velocity of all particles, the mass corresponds to M200C (as 
defined by Eq. (2) when applying A^cf = 200 and prcf = Pcrit), 
Knax is the peak value of the rotation curve, the shape the ratio be- 
tween the smallest and largest eigenvalue of the moment of inertia 
tensor Eq. (7), and the spin parameter As as given in Eq. (4). This 
approach and the use of a common data set, which might be biased 
towards rather clean subhaloes that are easier to detect, means that 
the scatter reported here should be considered lower limits. 
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Figure 2. Cumulative mass Af200c (upper left) and Vmax (upper right) functions for the subhaloes on Level 4 of the Aquarius host A (Springel et al. 2008). 
The arrows in the Vmax function indicate the number of particles inteiior to /?max, the position of the peak of the rotation curve. The error bars represent the 
mean mass and Vmax function, respectively, of the codes (itlcr). Lower left panel: the fractional difference of the mean and code subhalo mass functions. 
Lower right panel: the relative offset from the mean of the cumulative count. The pair of solid lines in each of the residual plots simply indicates the 10 per 
cent error bars. Note that both properties (i.e. mass and Vmax) have been determined by a common post-processing pipeline. 



The plots in the following Sub-Sections 4.2.2 through to 4.2.5 
now all follow the scheme: the x-axis shows the median of the halo 
mass med(M) whereas the y-axis gives the normalized difference 
between the lower and upper percentiles equivalent to the 3rd and 
7th ranked of the distribution across all nine halo finders. We de- 
liberately chose to use medians and percentiles as the distribution 
of properties across finders is highly non-Gaussian and at times bi- 
ased by just one or two outliers. The plots further show medians 
in four mass bins as histograms to highlight any possible depen- 
dence on mass. And those points for which the difference between 
the 3rd and 7th percentile is zero are shown at the bottom of the 
2/-axis. The number of cross-matched haloes is 823 and should be 
compared against the total number of objects found by each indi- 
vidual halo finder given in Table 2 of Onions et al. (2012, Aq-A-4 
row); however, for convenience we list here in Table 3 the number 
of haloes found by each code in excess of the common 823 objects. 
Fig. 8 indicates that the majority of these missing objects are small, 
containing less than 200 particles. 

4.2.2 Position & Bulk Velocity 

In Fig. 3 we start by inspecting the errors in position and velocity 
with the former deviation Apos normalized by the median radius 
for the object, med(7?) and the latter Avd by the median of the 
peak of the object's rotation curve, med( Kriax)- We can see a trend 
for both variations to decrease for more massive objects (especially 
for the position), but the errors are rarely larger than a few percent 
with the overall median error being 0.2 per cent and 0.8 per cent 
for position and velocity, respectively. The trend with mass reflects 
the resolution dependence of the accuracy of both the position and 
bulk velocity of the (sub-)haloes. This scatter is consistent to that 
observed in Knebe et al. (201 1) for mock haloes. 



4.2.3 Mass 

The recovery of subhalo mass is presented in Fig. 4. We no longer 
see a prominent trend with mass anymore. The discrete nature of 
the particle masses is evident; the difference AM (again normal- 
ized by the median of the mass itself) can only be a multiple of the 
actual particle mass which gives rise to the diagonal stripes visible 
in the plot for lower-mass objects. The overall median of the scatter 
is found to be 3 per cent. 



4.2.4 Vmax & -Rmax 

Let us consider next the magnitude and radial location of the peak 
in the rotation curve of the halo, characterised by the values of 
i?max and Vmax, respectively. It has been claimed that these quanti- 
ties provide a good proxy for the mass and spatial scale of the object 
(see e.g. Ascasibar & Gottlober 2008; Muldrew et al. 2011), and 
our previous comparisons (Knebe et al. 2011; Onions et al. 2012) 
show that this may indeed be the case, especially for the maximum 
circular velocity. We can indeed see in Fig. 5 that the scatter in the 
Vmax value (normalized to Knax itself) is lower than for the mass 
having a median of a mere 0.6 per cent. However, the variations in 
^max are naturally larger due to the uncertainty in the determina- 
tion of the peak position: the rotation curves show a flat behaviour 
about JJmax leading to a median error of 2 per cent. 



4.2.5 Shape & Spin 

In Fig. 6 we will now turn our attention towards the shape and spin 
of the objects identified by the halo finders. The calculation of these 
quantities is more involved and hence we expect the scatter to be 
larger, e.g. several of the errors already reported here will propagate 
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Figure 3. Relative errors in tlie recovered positions (top) and velocities (bot- 
toin) of the subhaloes found by all finders. The en'ors are scaled by the 
inedian size of the object and the median Vmax, respectively. 



in a non-linear fashion to these properties. While the shape (defined 
here as sphericity, i.e. the ratio between the smallest and largest 
eigenvalue of the moment of inertia tensor defined by Eq. (7)) ap- 
pears to be determined to approximately the same order of magni- 
tude as the previous quantities giving a median of 5 per cent, the 
spin is less precisely determined with a median of 18 per cent. For 
a more elaborate discussion of the spin using the same data and 
finders as presented here we like to refer the reader to Onions et al. 
(2013). 



Figure 4. Relative errors in the recovered mass for the subhaloes found by 
all the finders. The errors for each object are scaled by the median mass 
found for the object. 



4.3 Galaxies 

In Knebe et al. (2013) we presented a comparison of codes as ap- 
plied to the Constrained Local UniversE Simulation (CLUES) of 
the formation of the Local Group which incorporates much of the 
physics relevant for galaxy formation. We compared both the prop- 
erties of the three main galaxies in the simulation (representing the 
Milky Way, Andromeda, and M33) as well as their satellite pop- 
ulations for a variety of halo finders ranging from phase-space to 
velocity-space to spherical overdensity based codes, including also 
a new mere baryonic object finder. We obtain agreement amongst 
codes comparable to our previous comparisons - at least for the 
total, dark, and stellar components of the objects. However, the dif- 
fuse gas content of the haloes shows great disparity, especially for 
low-mass satellite galaxies. This is primarily due to differences in 
the treatment of the thermal energy during the unbinding procedure. 
We acknowledge that the handling of gas in halo finders is some- 
thing that needs to be dealt with carefully, and the precise treatment 
may depend sensitively upon the scientific problem being studied. 

To give an impression of the differences found we extracted 
all the particles from the simulation data in a spherical region about 
the centre of a certain subhalo. The results can be viewed in Fig. 7. 
We can clearly see that the region about the object's centre con- 
tains a substantial number of gas particles (shown in the upper left 
panel). All codes featuring a treatment of the gas thermal energy 
either during or prior to the unbinding (i.e. AHF and SUBFIND) 
remove essentially all gas from the subhalo; whereas ROCKSTAR, 
which does not include the thermal energy during the unbinding, 
and STF, which does not process the gas and stars through an un- 
binding routine, are left with a residual amount of gas. Note that 
JUMP-D is designed to find galaxies and ignores the dark matter. 

When using AHF in a mode where the gas thermal energy has 
been ignored, AHF basically considers all gas particles seen in the 
left panel to be part of the subhalo. In contrast, due to their phase- 
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Figure 5. Relative error in Vmax (top) and iJmax (bottom). 



/velocity-space nature, both ROCKSTAR and STF consider the ma- 
jority of the gas particles to belong to the background host and keep 
only a small amount of them. For the object considered here the ef- 
fective thermal velocity of each gas particle is always larger than its 
kinetic velocity (not shown here) and hence the grouping in phase- 
or velocity-space will naturally remove (hot) gas whenever a gas 
particle is considered not belonging to it based upon kinetic veloc- 
ity only. Or put differently, the gas component forming part of the 
background halo is prone to be removed by such finders as they in- 
herently use velocity information when grouping and collecting the 
initial set of particles, whereas configuration space finders only deal 
with velocities (either kinetic or thermal) in a (post-processing) un- 
binding procedure. On a side note, a visual inspection of a larger 




^;^¥&mtK^. 






med(MgooJ [Mg/h] 

Figure 6. Relative errors in shape (top) defined as the ratio between the 
smallest and largest eigenvalue of the moment of inertia tensor defined by 
Eq. (7) and spin parameter A^ (bottom). 



region about this particular sample satellite galaxy indicates that it 
has passed extremely close to its host already and been subjected 
to severe tidal forces; this might also explain why RoCKSTAR as- 
sociates gas to one side of the galaxy. 



4.4 Summary 

There are several sources of uncertainty in the properties computed 
by halo finders. While Section 3 was concerned with the ambigui- 
ties arising from the definition of each quantity, here we have also 
quantified the scatter due to the different procedures followed by 
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Figure 7. Visualization of a subhalo showing all particles inside a spherical 
region about the identified centre (upper-left panel) versus the actually iden- 
tified particles of the individual halo finder showing all types of particles: 
gas (blue), stars (red) and dark matter (black). 



each halo finder in order to compute the same quantities from the 
same data. Our results are succinctly summarized in Table 4. While 
the differences are below 1 per cent for position, bulk velocity and 
Vmax (and still marginal for masses), they can rise to over 15 per 
cent for certain derived quantities such as spin parameter. With- 
out any further investigations, these numbers could in fact indicate 
reasonable error bars to be attached to any study based upon the re- 
sults derived from a single halo finder. However, the values listed in 
Table 4 should be considered lower limits as we have restricted the 
comparison to objects found by all halo finders and used a common 
post-processing pipeline. 

However, this is a rather academic situation as in reality nei- 
ther are the objects found by each halo finder restricted to some 
common set nor will the properties be calculated applying the same 
method or definition or even common post-processing pipeline. To 
this extent we also presented the scatter in the general mass and 
Vma.^ functions noting that it is in fact larger than the lower limits 
derived before. But what is responsible for this scatter? Are there 
ways to understand its origin and hence possible post-correct for 
it? The sources of the scatter are certainly two-fold, i.e. the indi- 
vidual particle collection and unbinding procedures of each finder. 
Different finders do not necessarily find the same set of objects and 
different finders retrieve variations in the properties of the same ob- 



Table 4. Scatter in the main properties computed by the halo finders. Note 
that this error is only a lower limit, especially for the numbers based upon 
the set of common objects. 



Quantity 


Scatter 


set of common objects: 




Position 


< 1 % R200 


Bulk velocity 


< 1 % Vmax 


M200C 


3% 


V^max 


< 1% 


^max 


2% 


Shape 


5% 


Spin 


18% 


full catalogues: 




Mass function 


10% 


Vmax function 


20 - 30 % 



ject. We explore these possibilities in more detail in the next section 
and discuss the relevance of this variation for scientific applications 
in Section 6. 



5 PRECISION COSMOLOGY? 

One of the most pressing questions arising from the plots presented 
in the previous Section is: "Why is there a residual scatter between 
halo finders of up to 10 per cent?" The finders have been applied 
to the same data, subjected to a common post-processing pipeline, 
and, in some cases, even compared on a set of objects identified 
in common. As mentioned before, unless we can be certain which 
halo-finding technique is the best (if such a statement can be made 
at all), the observed scatter indicates the accuracy to which we can 
determine these properties in cosmological simulations. In this sec- 
tion, we will try to pinpoint the origin of the observed scatter by ex- 
amining the contribution of the different steps outlined in Section 2. 
Our final goal is to see if it is possible to bring the errors incurred by 
halo finding down to the one per cent level demanded by precision 
cosmology (e.g. Tinker et al. 2008; Komatsu et al. 201 1). 



5.1 Origin of the Scatter 

What are possible sources for the observed scatter? We have al- 
ready seen that there are two fundamental steps involved in obtain- 
ing halo catalogues, i.e. 

• halo finder methodology (Section 2) 

• definition of halo properties (Section 3) 

In that regard, the differences seen in Fig. I combine uncertain- 
ties in both these steps whereas Fig. 2 is only subjected to the first 
point thanks to the common post-processing pipeline. For the time 
being we will actually leave the investigation of differences due 
to the second point aside and only focus on the mode of opera- 
tion of halo finders. But one also has to go one step further and 
ask whether one is interested in the actual halo-to-halo scatter or 
systematic errors affecting the whole halo ensemble. This is an im- 
portant point as we have seen that the variations in, for instance, the 
halo mass for cross-identified objects is of order 3 per cent (cf. Ta- 
ble 4) whereas the ensemble mass function of the same data set seen 
in Fig. 2 clearly shows larger scatter: while the number of uniquely 
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found haloes is 823, each finder nevertheless found approximately 
the same number of objects not part of the common pool (cf. Ta- 
ble 3). This suggests that once the halo finders do find the same set 
of objects, the errors across them should decrease approaching the 
first set of values listed in Table 4. As we have shown, most of the 
missing objects are small. Hence one way to improve the 'purity' 
of a catalogue is to only rely on objects containing more than 300 
particles. This shouldn't be too surprising as this is the same limit 
found elsewhere (e.g. Onions et al. 2013) to be required if stable 
derived halo properties (such as the spin parameter) are desired. 

Focusing on the halo finder methodology and following the 
scheme of Section 2, all the differences between one halo finder and 
another must fall into one of the following categories, representing 
the basic steps required to end up with a halo catalogue: 

• candidate identification 

• halo centre and bulk velocity determination 

• particle collection and unbinding procedure 

• mass and edge determination 

We will now follow a divide-and-conquer strategy, in an at- 
tempt to isolate those steps that are responsible for most of the 
scatter, again using the Aquarius A, Level 4 data set and restricting 
ourselves to the commonly post-processed particle ID lists of some 
of the finders featuring a credible unbinding procedure. While we 
include mass and edge determination in this list we do not discuss 
it below as with a common post-processing pipeline this step is the 
same for all finders and introduces no scatter. This highlights the 
issue mentioned above, that in actuality a common post-processing 
routine is not used by everybody and significant scatter can be in- 
troduced at this stage unless great care is taken. 

5.7.7 Candidate Identification & "Excess" Objects 

There is one element that could make a sizeable contribution to the 
scatter in the mass and Knax functions: whether a given halo is 
detected or not. Remember that the deviations reported in Table 4 
were based upon a common set of objects, but that each finder cer- 
tainly found (substantially) more objects than defined by this com- 
mon set (cf. Table 3). For every subhalo found by a given reference 
code in the Aquarius A-4 data we now identify its counterpart in 
the particle ID lists of all other codes. This was again accomplished 
by examining the particle lists and maximizing the merit function 
C = A'g\aj(,d/(-'ViA'^2), where A'sharcd is the number of particles 
shared by two objects, and TVi and 7V2 are the number of parti- 
cles in each object, respectively (see Appendix B for more details). 
This is the same procedure as applied before, but this time we aim 
at quantifying how many of the objects found by a given halo finder 
were found by the other finders. 

To investigate this, we plot in (the upper panel of) Fig. 8 the 
fraction of codes that also found the same object on the t/-axis 
against the (binned) number of particles of the object in the ref- 
erence code. We can see that there is a group of codes for which all 
other codes found the same objects (with also approximately the 
same number of particles). In the lower panel of Fig. 8 we show the 
actual number of objects entering into the histograms of the of the 
upper panel, i.e. the number of haloes found by each finder in the 
respective number of particles bin. 

While there are differences visible in Fig. 8, how will they af- 
fect, for instance, the (sub-)halo mass or Vmax function - our usual 
measure for the scatter? This can be viewed in Fig. 9 where we 
show in the top panel the subhalo mass function and in the bot- 
tom panel the Vmax function. The two set of lines in each panel 
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Figure 8. The fraction of codes finding the same object as the reference 
code given in the legend as a function of the binned object's number of 
particles in that reference code (upper panel) as well as the actual number 
of objects entering the comparison in the respective number of particle bin 
(lower panel). 



refer to the original functions based upon all objects found by each 
finder (shifted upwards by a factor of 2 for clarity, cf. Fig. 2) and 
the same function using only the set of common objects defined in 
Section 4.2. 1 (shifted downwards by a factor of 2 for clarity). The 
two lower panels in each of the larger panels is the fractional differ- 
ence of the curve for a given finder to the mean value. Remember 
that these curves quantify the variation in the number of objects 
found above a certain mass threshold; they do not directly measure 
differences in halo mass. We see here that the restricting ourselves 
to the common set does not change the observed scatter in the mass 
function, whereas we find a marginal improvement for Vmax. Note 
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Figure 9. Subhalo mass M20OC (top) and Knax (bottom) functions for all 
identified objects (upper set of lines) and the common set of objects (lower 
lines). In each case the upper and lower curves have been scaled by a factor 
of 2 to separate them. The pair of solid lines in each of the residual plots 
indicates the 10 per cent error bars. 



that the peifect agreement of the functions for the common set at 
the 20 particle limit imposed is artificial as the number of objects is 
identical by construction. These results indicate that the scatter is 
not dominated by objects which are not uniquely found by all find- 
ers. These objects do contribute particularly at the low-mass end 
where their consistent detection becomes difficult, but variations in 
halo properties from code to code are principal source here. One 
word of caution here, the error estimates presented in the previous 
Sub-Sections 4.2.2 through to 4.2.5 (and summarized in Table 4) 
are smaller than the scatter seen here as they only took into account 
the difference in the 3rd and 7th percentile and hence ignoring out- 
liers seen here. 

Studying the common set of objects immediately raises the 
question about the non-common set of haloes, i.e. differences in 
the 'excess' objects whose number appears to be non-negligible 
(cf. Table 3). Are these primarily low mass subhaloes? Where in 
the host halo do they lie? To answer these questions we present in 
Fig. 10 both the cumulative excess halo mass function (upper plot) 
and the cumulative radial distribution of these objects (lower plot, 
normalized to the respective total number of excess haloes) always 
in comparison to the full set of haloes. The upper plot additionally 
shows again the residuals with respect to the mean. 

We can appreciate that these haloes are primarily of low mass 
and composed of less than 100 particles, respectively, but found at 
all possible radial distances. And it comes at no surprise that those 
objects found close to the host centre are forming part of the excess 
set. But we also remark that some of the halo finders actually found 
subhaloes very close to the host's centre whereas others do not. We 
recommend to view the radial distance plot in relation to Fig.7 of 
Onions et al. (2012) that shows the cumulative mass in subhaloes 
as a function of distance. 



J. 1.2 Centre and Bulk Velocity Determination 

Given that we found a fair agreement for (sub-)halo centres in 
Fig. 3, and that there is little difference between Fig. 1 (no common 
post-processing) and Fig. 2 (common post-processing), it seems 
unlikely that the details of the location of the centre or the calcula- 
tion of the bulk velocity of an object make a significant contribution 
to the error budget. This is also in agreement with the findings re- 
ported for the mock haloes studied in Knebe et al. (201 1) and some 
additional tests (not presented here) where varying the centre and 
bulk velocity definition in our common post-processing pipeline al- 
ways yielded comparable results. We rather conjecture that either 
the particle collection method or the particulars of the unbinding 
procedure may be held responsible for the scatter, both to be stud- 
ied in the following Sub-Section. 



5.1.3 Particle Collection and Unbinding 

Obviously, one cannot come up with either a unique candidate 
identification or particle collection method, as these clearly define 
the halo finder in question: for instance, phase-space finders base 
their particle collection on an intrinsically different algorithm than 
configuration-space finders, and FOF-based methods usually com- 
bine these two steps into one single procedure - and they may also 
not apply any (additional) edge definition other than the isodensity 
contour traced by the applied linking length. Unbinding is also an 
important - and delicate - step, and the particular way of carrying 
it out may be regarded as an intrinsic piece of some algorithms. 
For the Aquarius A-4 data set we now separate the two steps 
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Figure 12. Escape velocity profile for a sample (massive) subhalo as determined by a sample of the halo finders prior to their unbinding procedure (left panel) 
and after individual unbinding (right panel). The ranges indicate the mean, max, min and quartiles for each radial bin. 



of collecting particles and removing unbound particles from each 
other and show in Fig. 1 1 the resulting subhalo mass functions (up- 
per in-set panels) and the resulting variations (lower in-set panels). 
The top plot shows the subhalo masses as given right after the col- 
lection of particles considered potentially belonging to an object 
by each finder, whereas the bottom plot takes those subhaloes and 
subjects them to a common unbinding procedure. Please note that 
not all halo finders participated in this particular test. While there 
are huge differences in the amount of particles that each algorithm 
chooses to evaluate for (un)binding, applying a common procedure 
to all those candidate objects reduces the scatter to the level al- 
ready seen in Fig. 2, where the unbinding was left to the individual 
halo finders. On a separate note, this figure also clearly indicates 
how important unbinding is for configuration- space finders versus 
phase-space finders. The mass function for configuration finders 
such as AHF, Mendieta, VOBOZ, and SUBFIND changes sig- 
nificantly after unbinding whereas the distributions for H0T6D, 
ROCKSTAR, and STF do not. 

To quantify the influence of unbinding even more we per- 
formed yet another test: we requested participating halo finders to 
provide the escape velocity profile for a given pre-selected (mas- 
sive) subhalo originally found across all analyses. The results are 
presented in the left panel of Fig. 12, where we plot the calculated 
escape velocity ticsc ~ \/2|0| at each particle position, normal- 
ized by the average over the four partaking codes. First, one can 
see that different codes do in fact use different methods to calcu- 
late the gravitational potential and hence escape velocities. This 
is particularly the case for AHF, which uses a spherical approxi- 
mation when calculating the potential </>, whereas the other three 
codes base their (p value on a tree-construction of the particles. 
The spikes (dips) in the profile are sub-substructure objects that 
are not resolved by the (non-realistic) spherical method applied in 
AHF. The right panel shows escape velocity profiles after unbind- 
ing. The differences have marginally increased for the tree-based 
potential calculation, reflecting again the particulars of the individ- 
ual unbinding procedures, such as what velocity is used to deter- 
mine whether a particle is (un)bound. The few percent uncertainty 
in the escape velocity of the individual particles eventually leads 
to a 0.5 per cent error in the different masses after unbinding here. 
Consequently, unbinding is unlikely to be a primary source of the 
scatter observed (see also Table 4). 



Before we continue, we should emphasize that the term 'un- 
binding' is a little misleading. First, these structures do not exist 
in isolation but are treated as such when determining the binding 
energy. A particle near the edge might be considered bound if an 
object is treated in isolation but could be stripped by tidal forces. 
Second, a reference frame must be chosen to determine the kinetic 
energy of a particle. Some codes may choose to use an object's bulk 
velocity, others may use the velocity of the density peak, and other 
may use the velocity of the particle residing at the minimum of 
the gravitational potential. All of these are valid choices. Third, all 
so-called unbinding procedures use the instantaneous energy of a 
particle. Though particles that are false positives with high relative 
velocities are unlikely to remain with a subhalo, particles that are 
deemed to be marginally unbound might only be tidally stripped on 
timescales approaching a dynamical time. Only particles that leave 
the phase-space volume of a subhalo on timescales much shorter 
than a dynamical time can truly be said to be unbound. However, 
such a criterion is rarely considered when determining the 'bound' 
mass of a subhalo. 

Despite these complications, we have to acknowledge that 
even with a common unbinding procedure there still remains a sig- 
nificant degree of scatter. This is not surprising given the large dif- 
ferences in the set of collected particles (i.e. upper panel of Fig. 11). 
We conclude that errors in the subhalo mass of order 10 per cent 
result from the different initial particle collection schema intrinsic 
to each method. 



5.2 Potential Improvements: facilitating merger trees 

As already touched upon in Section 2.6 it is often the case that we 
not only have the end-state of a particular simulation but also the 
evolutionary history at a series of earlier times. We can then make 
use of this additional information to calculate the temporal evo- 
lution of haloes. It has been shown that software tracking haloes 
across snapshots can actually improve the results and reliability of 
halo finders (Springel et al. 2001; Gill et al. 2004; Tomien et al. 
2004; Giocofi et al. 2008; Tweed et al. 2009; Giocoli et al. 2010; 
Behroozi et al. 2011; Benson et al. 2012). This is achieved by ex- 
plicitly ensuring dynamical consistency of halo properties across 
timesteps. Note that a lot of these papers primarily deal with sub- 
haloes and following them after infall into the host. But the publicly 
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Figure 11. Highlighting the relevance of a common unbinding procedure: 
we show the same subhalo mass function as in Fig. 2, computed from raw 
particle lists returned by each code without unbinding (top panel) and with 
a common post-processing routine that also features a common unbinding 
procedure (bottom panel). The pair of solid lines in each of the residual 
plots simply indicates the 10 per cent en'or bars. 



Figure 10. Subhalo mass (top) and radial distance (bottom) functions for all 
identified objects (upper set of lines) and the 'excess' haloes (lower lines). 
In each case the upper and lower curves have been scaled by a factor of 5 
to better separate them. The upper plot also shows (in the lower panel of it) 
the residuals with respects to the mean. 



available tool presented by, for instance, Behroozi et al. (2011) is 
applicable to both distinct haloes and sub-haloes. It follows a pre- 
liminary merger tree but simultaneously integrates the movement 
of haloes backwards in time: knowing the positions, velocities, and 
mass profiles of haloes at one timestep, one may use the laws of 
gravity and inertia to predict their properties at adjacent timesteps. 
This allows the removal of spuriously defined objects or for the in- 
sertion of objects not properly identified by the halo finder in the 
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first place. The method employed by Behroozi et al. (2011) im- 
proves the completeness of the halo catalogues in general, particu- 
larly at earlier redshifts, even though the results obviously depend 
on the completeness of the catalogue used as the initial input (gen- 
erally the z = Q snapshot analysis). For more elaborate details we 
refer the interested reader to the aforementioned paper 

Such merger trees are now routinely used as inputs to semi- 
analytic models of galaxy formation to provide the backbone within 
which galaxy formation takes place (e.g. Cole et al. 2000; Benson 
et al. 2002; Croton et al. 2006; De Lucia et al. 2006; Bower et al. 
2006; Bertone et al. 2007; Font et al. 2008). Therefore, Benson 
et al. (2012) have studied the convergence in the stellar and to- 
tal baryonic masses of galaxies, distribution of merger times, stel- 
lar mass functions and star formation rates in the GALACTICUS 
model of galaxy formation (Benson 2012) as a function of the num- 
ber of snapshots used to represent dark matter halo merger trees. 
They found that at least 128 snapshots are required in-between red- 
shifts 2 = 20 and z = to achieve convergence to within 5 per 
cent for galaxy masses, highlighting again the importance of 'suf- 
ficient' temporal information for the post-processing of the halo 
catalogues. 

Further, the utilisation of dark matter halo merger trees entails 
another 'feature' for workers in the field of semi-analytical galaxy 
formation, the so-called 'orphan galaxies' Gao et al. (2004): it can 
and does happen that a dark matter subhalo dissolves due to tidal 
forces (and lack of numerical resolution) while orbiting in its host 
halo (e.g. Gill et al. 2004, for a study of these disrupted subhaloes). 
However, a galaxy having formed prior to this disruption and re- 
siding in it should survive longer than this subhalo. Therefore, it 
became standard practice to keep the galaxy alive even though its 
subhalo has disappeared, calling it 'orphan' galaxy (Springel et al. 
2001; Gao et al. 2004; Guo et al. 2010; Frenk & White 2012). This 
is another example of the benefits of using merger trees once the 
simulation of the halo finder does not provide sufficient informa- 
tion anymore. 

But in any case, such merger tree based comparisons and im- 
provements, respectively, are beyond the scope of the present work, 
but will certainly form part of the next halo finder workshop sched- 
uled for 2014. 



5.3 Summary 

We have been able to associate some of the scatter between the 
halo finders to inconsistent property definitions and differences in 
the unbinding routines. However, the most significant part of the 
scatter seems to stem from the differences in the initial particle col- 
lection. Given the present situation, the best halo finders can do for 
precision cosmology is to provide error bars of 10 per cent on halo 
mass and Knax functions of which the prime contribution comes 
from the initial particle collection and (to a smaller degree) sub- 
tleties in the way the unbinding is performed. Note that these two 
parts are intrinsic to the method of the halo finder While one might 
still aim at outsourcing the unbinding, the actual collection of par- 
ticles is the halo finder stripped down to its bare minimum. Making 
any changes here is equal to simply using another halo finding tech- 
nique. 

We nevertheless need to remind that reader that the situation 
is a bit different for field haloes and subhaloes. For the latter, edges 
and detection thresholds are defined so differently among the find- 
ers that a 10 per cent mass difference after common unbinding is 
not inexplicable. For the former, the reasons for discrepancies are 
substantially reduced to issues of definition (and possibly bugs). In 



that regards, we can only re-iterate that the end-user of any halo 
catalogues needs to make sure to understand the mode of operation 
of the halo finder upon which the catalogue is based. But the good 
news here is that for basically all properties studied here the error 
decreases with increasing number of particles in the object. 

But we also need to bear in mind that there is a subtle differ- 
ence between disparities in general distribution functions and vari- 
ations of properties of individual objects. The aforementioned er- 
ror of approximately 10 per cent may in part be driven by the fact 
that halo finders return different numbers of objects above a certain 
threshold (be that mass, Vmax, etc.), but we have shown that this 
effect is virtually negligible at all but the smallest masses (see top 
panel of Figure 10). This eiTor is mostly due to the scatter in the 
mass assigned to each individual halo, but it is exacerbated by the 
steepness of the cumulative mass function. Our one-to-one compar- 
ison of halo masses (Figure 4) finds that this scatter is substantially 
smaller, of the order of 3 per cent only. 

There are clear indication that a more sophisticated construc- 
tion of merger trees might reduce incompleteness (Behroozi et al. 
201 1; Han et al. 2012), albeit that any possible biases in the initial 
halo catalogue upon which the revised trees will be based will re- 
main. But the question still is, whether or not (and how) this will 
affect the end-user of halo finders and the scientific applications 
across fields that require halo catalogues as an input. We shall dis- 
cuss these implications in the following Section. 



6 ASTROPHYSICAL APPLICATIONS 

The applications of (sub-)halo catalogues of numerical simulations 
spread over various fields in astrophysics. These range from facili- 
tating the interpretation of numerical simulations over constituting 
input to semi-analytic models to comparison and analysis of obser- 
vational data. In this section, we comment on the different applica- 
tions and the influence/relevance of the halo finding uncertainties 
on the results in other fields. 



6.1 Galaxy Formation, Semi-Analytics & Merger Trees 

We start with one of the biggest topics, i.e. the formation of galax- 
ies within a cosmological context. There are presently two routes 
to this: one is to directly simulate cosmological volumes includ- 
ing all the relevant baryonic physics (see Scannapieco et al. 2012, 
for a recent comparison of various baryonic physics and methods), 
another is to defer to dark matter only simulations and apply semi- 
analytical recipes to them (Cole et al. 2000; Benson et al. 2002; 
Croton et al. 2006; De Lucia et al. 2006; Bower et al. 2006; Bertone 
et al. 2007; Font et al. 2008). And the latter is in fact one of the ma- 
jor areas that requires well understood and carefully constructed 
halo catalogues. 

Semi-Analytical Galaxy Formation Modem semi-analytical 
codes (Croton et al. 2006; Somerville et al. 2008; Monaco et al. 
2007; Henriques et al. 2009; Benson 2012) usually take as input 
halo merger trees derived from large numerical simulations rather 
than purely analytical forms such as Press & Schechter (1974) or 
Extended Press-Schechter (Bond et al. 1991). Stable semi-analytic 
models require stable and physically realistic merger trees: haloes 
should not dramatically change in mass, size or jump in physical lo- 
cation from one step to the next. As we have seen above all of these 
changes can result from a poorly constrained halo finder, with halo 
size being particularly ill-determined. If care is not taken with the 
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choice of the halo centre this can move dramatically from one step 
to the next, particularly during a large merger event. This dynam- 
ical process also leads to large-scale bridging between structures 
during the initial particle collection phase. This can lead to sub- 
stantial changes in an objects mass in a very short timescale. Such 
changes are unphysical. Great care also needs to be taken to con- 
struct a clean halo catalogue from which to build the trees. The 
primary concern here is that only gravitationally bound structures 
are used, otherwise, particularly in regions adjacent to large objects 
where the background density is already enhanced, spurious group- 
ings can be claimed. Thus we recommend that only halo finding al- 
gorithms with a well tested unbinding stage should be used to cre- 
ate catalogues that are intended to form the basis of a semi-analytic 
model. 

Subhalo tracking is also sometimes incorporated into semi- 
analytic models. We caution that care should be taken with this ap- 
proach as some algorithms (e.g. SUBFIND) are very conservative 
in their allocation of mass to substructures. In this case the sub- 
halo mass can drop dramatically as the structure moves closer to 
the centre of a host, occasionally vanishing entirely only for it to be 
partially recovered again afterwards (see, for instance, Figs. 10-12 
in Knebe et al. (201 1) or Figs. 4 and 5 in Muldrew et al. (201 1)). 
While all tested finders do a good job of actually locating substruc- 
tures, even close to the very centre of the host halo, the recovered 
subhalo mass as a function of the distance to the host halo centre is 
a function of the particular halo finder you are using (Knebe et al. 
2011; Muldrew etal. 2011). 

This effect certainly influences the so-called orphan galax- 
ies (cf. Section 5.2), that is galaxies that are no longer associated 
with a dark matter subhalo. How these galaxies are treated varies 
from model to model. The most common approach is to associate 
the galaxy with the most bound dark matter particle at the snap- 
shot just before the subhalo has vanished from the halo catalogue. 
Subsequently either this particle is tracked or an orbit is estimated 
based on the properties of this particle and the galaxy is allowed 
to survive for a merging time computed using the Chandrasekar 
formula (e.g. Gao et al. 2004; Guo et al. 201 1). None of the cur- 
rent semi-analytical codes account for the possibility that the sub- 
halo will 'reappear' at a later time. However, some codes producing 
merger trees do attempt to correct for this artificial subhalo removal 
(Behroozi et al. 2011; Han et al. 2012). The effect this correction 
will have on the galaxy population produced by semi-analytic mod- 
els will certainly be one of the hot topics discussed at future work- 
shops. 

A related issue is that of major merger events. As soon as two 
nearly equal mass haloes approach each other and their radii start 
to overlap, how will halo finders deal with this? Some codes (e.g. 
AHF) have the intrinsic problem that one of them will be tagged 
'host' and the other 'subhalo' at some stage of the merger; this 
will then evidently lead to a mis-assignment of mass. For a more 
elaborate study and discussion of this effect we refer the reader to 
Behroozi et al. (in prep.). 

Another important input to semi-analytical models is the spin 
parameter of dark matter haloes as the size of the galactic disk re- 
siding within them is often derived from this parameter, via i.e. 
conservation of angular momentum leading to the formation of a 
rotationally supported disk (e.g. Mo et al. 1998). Some models as- 
sume a simple relation between a disk size and the spin parameter 
and use a fiducial value for the spin parameter, often derived from 
simulations (Lu et al. 20 1 1 ). Others use the current angular momen- 
tum or spin parameter of the host halo to determine the angular mo- 
mentum of the disk (e.g. Guo et al. 201 1; Benson 2012) and these 



codes require this information to be present in the halo merger tree. 
However, the spin is the least faithfully recovered quantity amongst 
different halo finders (e.g. Section 4) as it is the most sensitive to 
the unbinding procedure. In fact. Onions et al. (2013) showed that 
the spin parameter can be used as a metric to determine how well 
a (sub-)halo finder prunes its particle collection of high velocity 
interlopers. The impact of an unstable or poorly measured spin pa- 
rameter is likely small given the uncertainty in the baryonic physics 
used in semi-analytical modeling to convert a halo spin to the spin 
of a gas and stellar disk. However, given its use as a metric for as- 
sessing the quality of a halo catalogue and its use as a diagnostic 
for identifying unrelaxed (read merging) haloes (e.g. Klypin et al. 
201 1), it warrants further investigation before its default inclusion 
in halo merger tree construction. 

One quantity commonly derived with semi-analytical model- 
ing is the luminosity of the galaxy residing within the dark mat- 
ter (sub-)halo identified by the object finder. And typically ob- 
servational luminosity functions are better constrained than mass 
functions that consist of derived quantities. However, comparisons 
with luminosity functions from numerical simulations are not only 
challenging because of the differences between various (sub-)halo 
finders: currently, the variety of subgrid models in hydrodynamical 
simulations of galaxy formation introduces significantly larger un- 
certainties (Scannapieco et al. 201 1), and the same is true for semi- 
analytical modeling (e.g. Snaith et al. 201 1). This shows that with 
the current state of (sub-)halo finders luminosity functions from nu- 
merical simulations can be used to investigate the physical effects 
of different subgrid models without concern with respect to any 
specific halo finder used. 

Finally, aside from variations introduced by using different 
halo finders, Avila-Reese et al. (2003) investigate the effects of 
non-Gaussian initial conditions on (sub-)structure in CDM. They 
found that the spin parameter distribution depends on the amount 
of non-Gaussianity in the initial conditions, though at 100 particles 
their minimum halo mass is significantly below what is required for 
reliable spin measurements (Bett et al. 2007; Onions et al. 2013). 

Galaxy Formation Simulating the evolution of the visible Uni- 
verse is much more complex than just following its dark compo- 
nents, because it requires understanding the many astrophysical 
processes which drive the evolution of the baryonic component un- 
der the gravitational influence of the dark matter. But neverthe- 
less, this is the same problem semi-analytical modelers are fac- 
ing, with the only difference here that the dynamics of the bary- 
onic/coUisional component is followed by means of integrating the 
corresponding equations of hydrodynamics. This entails that one 
specific model for sub-grid physics like star formation and stel- 
lar feedback requires one full simulation to be run. The difference 
to semi-analytics is on the one hand the presence of gas and stars 
in the simulation and on the other hand the intrinsic handling of 
merger trees. While the latter alleviates part of the aforementioned 
problems with the construction of merger trees, one nevertheless 
needs to follow the formation and evolution of galaxies through- 
out the simulation: some of the hottest topics in the field nowa- 
days, have to do with the morphological evolution of galaxies and 
the properties of the stellar populations building up the galaxies. In 
these scenarios, the role of mergers (Toomre 1977; White & Rees 
1978) and its connection to the cosmic web (Aragon-Calvo et al. 
2007; Hahn et al. 2007; Falck et al. 2012; Libeskind et al. 2012) 
seems to be crucial. These are topics where hydrodynamic simu- 
lations produce the most important advances in order to compute 
the required merger rates, fraction of masses in mergers, and stel- 
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lar populations brought by merger. It is therefore fundamental to 
have a well defined way to identify host haloes and subhaloes ~ 
and to construct reliable merger trees for them - as well as cos- 
mic web classifiers (Aragon-Calvo et al. 2007; Hahn et al. 2007; 
Hoffman et al. 2012), something which is beyond the aim of this 
work though. The identification of objects (haloes and galaxies) in 
such hydrodynamical simulations poses further challenges to halo 
finding techniques. Though we find excellent agreement between 
different finders regarding the dark matter and stellar mass associ- 
ated with a galaxy, we find significant differences in the gas content, 
highlighting the need for a well-defined treatment of gas (cf. Sec- 
tion 4.3 and Knebe et al. 2013). 

6.2 Large-Scale Structure 

Precision cosmology is often synonymous with large-scale struc- 
ture (LSS). The growth rate of LSS is directly sensitive to the ex- 
pansion rate of the Universe, and hence is an excellent probe of 
cosmological parameters. The question is whether the required pre- 
cision in theoretically derived quantities is attainable. For instance, 
the theoretical dark matter halo mass function must be known to an 
accuracy of ^ 1 per cent to constrain the time evolution of dark 
energy models with surveys such as DES (Wu et al. 2010). Reed 
et al. (2012) showed that this type of accuracy is achievable if diffi- 
cult for pure dark matter only simulations, but they only considered 
the FOF halo finder. However, the mass function depends on how 
haloes are identified (e.g. Tinker et al. 2008; Cohn & White 2008; 
Lukic et al. 2009; Knollmann & Knebe 2009; More et al. 2011; 
Bhattacharya et al. 2011; Knebe et al. 2011; Watson et al. 2012). 
Moreover, there is no guarantee that differences between halo find- 
ers remain fixed at higher redshifts - in our comparisons we only 
considered redshift z = 0. As a consequence, deviations from a 
universal behaviour will probably depend on applied halo finder 
(e.g Warren et al. 2006; Tinker et al. 2008; Courtin et al. 2011; 
Watson etal. 2012). 

Present and future large-scale structure surveys of the Uni- 
verse (just to name a few, BOSS, PAU, WiggleZ, eBOSS, Big- 
BOSS, DESpec, PanSTARRS, DES, HSC, Euchd, WFIRST, etc.) 
will aim to constrain the cosmological model and the true nature of 
dark energy with unprecedented accuracy. This very high required 
accuracy for surveys is primarily driven by the number density 
of clusters. Cluster mass dark matter haloes represent rare objects 
that lie on the exponential tail of the halo mass function. As they 
arise from extremely rare density fluctuations, not only are they 
probes of cosmological parameters such as Q.m, they are sensitive 
to any non-Gaussianities present in the primordial density field (e.g. 
Matarrese et al. 2000; Pillepich et al. 2010; Marian et al. 201 1). Ac- 
curately recovering cluster properties may present a special chal- 
lenge to certain object finders since many clusters are unrelaxed 
or in the process of merging. For instance, it has been shown that 
FOF groups will 'merge' before the equivalent haloes are found by 
algorithms based on isodensity contours, such as SO or AHF (e.g. 
Klypin et al. 201 1; Behroozi et al. 2013). Care must also be taken 
when unbinding candidate clusters since merging objects will pro- 
duce large velocity offsets, which subsequently will result in par- 
ticles being considered unbound to either of the two merging halo 
cores while still possibly being bound to the merging system as a 
whole. 

In order for the aforementioned surveys to be used for preci- 
sion constraints on the cosmological parameters the 2-point galaxy 
correlation function (or alternatively the power spectrum) needs to 
be determined to unprecedented accuracy (e.g. Smith et al. 2012). 



This also entails an unparalleled understanding of the galaxy bias 
(e.g. Nuza et al. 2012), non-linear effects (e.g. Chuang & Wang 
2012) and any non-Gaussianities present in the primordial den- 
sity field (e.g. Matarrese & Verde 2008; Jeong & Komatsu 2009; 
Pillepich et al. 2010). Using the correlation function for these stud- 
ies hinges on accurately determining the spatial distribution of 
haloes and subhaloes and understanding how galaxies are hosted 
in these dark matter potential wells. The spatial distribution of sub- 
haloes of different masses is particularly sensitive to the subhalo 
finder in question and will leave its imprint in the clustering prop- 
erties (Zentner et al. 2005; Tinker et al. 2010; Watson et al. 201 1). 
For instance, an analysis of the MultiDark simulation"" with ROCK- 
STAR, AHF, and BDM shows that missing subhaloes close to the 
centre of the host will lead to a dramatic decrease of the two-point 
correlation function on small scales (Knebe et al., in prep.). Un- 
fortunately this phenomenon has not arisen in any of the previous 
comparison projects as it requires a comparison of (sub-)halo find- 
ers in large-scale structure simulations with sufficient resolution to 
resolve subhaloes. Therefore, it is important to know the limitations 
of the halo finder applied to such large-scale simulations. 

LSS measurements and statistics also offer many avenues 
for investigating deviations from the standard ACDM cosmology. 
From the theoretical point of view, studying LSS in simulations 
with differing dark matter models, dark energy types or modified 
gravity theories (cf. also Section 6.7 below), we conclude from the 
results presented here and in previous comparisons that it is impor- 
tant to always use the same halo finder. Care should be taken when 
choosing that finder as some may behave poorly in certain situa- 
tions. For instance, if a particular cosmological simulation results 
in a very high merger rate and many filamentary structures, one 
might expect the mass function produced by FOF to be artificially 
biased to high masses due to the presence of filaments connecting 
haloes (see related discussion on halo mergers in Klypin et al. 201 1 ; 
Behroozi et al. 2013). Furthermore, one should in general be care- 
ful when studying questions such as the universality of the mass 
function or trying to reproduce simulation results with Excursion 
Set Theory (Zentner 2007). In both cases the halo finder should be 
specified and the main parameters given to the end-user who devel- 
ops the theory. 

From the observational point of view, all of these issues boil 
down to the fact that end-users should have a clear understanding 
of the main characteristics of the halo finder used in the Af-body 
simulation - and its limitations. The onus is on developers of halo 
finders to make the object selection criteria clear (and to clearly 
define the code's deficiencies) preferably even allowing observa- 
tional astronomers to apply the halo finder to their data. For that 
reason, for instance. Tinker et al. (2008) have argued in favour of 
using SO halo catalogues (or other like algorithms which use den- 
sity to define halo edges) over FOF halo catalogues, since defining 
haloes using isodensity contours offers a more direct comparison 
to X-ray observations of clusters. However, since this this is not al- 
ways possible and nevertheless also involves systematics, it is im- 
portant to know that (halo finder) errors will propagate to uncer- 
tainties in estimates of cosmological parameters, for example the 
dark energy equation of state. We close by remarking that it is not 
only the halo finding community that is under pressure to supply 
high-precision results: Smith et al. (2012) have recently shown that 
rigorous convergence testing of A'^-body codes themselves is also 
needed to meet the future challenges of precision cosmology. 
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6.3 Near-Field Cosmology 

Ever since the overmerging problem in simulations of cosmic struc- 
ture formation had been overcome (Klypin et al. 1999), the ob- 
served/simulated Milky Way satellite mass function has been used 
as a test for our standard cosmology, ACDM. It failed and im- 
mediately led to the so-called missing satellite problem (Moore 
et al. 1999; Klypin et al. 1999), i.e. an over-prediction of subhaloes 
as compared to observations. Further, the possibility of numeri- 
cally modeling and studying the dynamics of halo substructure has 
spawned a new 'industry' of (computational) Near-Field Cosmol- 
ogy, a term coined by Freeman & Bland-Hawthorn (2002). Since 
then a multitude of articles emerged all dealing with the analysis 
of subhaloes and their orbits within dark matter host haloes (e.g. 
Stoehr et al. 2002; De Lucia et al. 2004; Diemand et al. 2004; Gill 
et al. 2004; Gao et al. 2004; Ki'avtsov et al. 2004, for the first such 
studies). 

We have seen that the theoretical subhalo mass function suf- 
fers from variations of ~ 10 per cent arising from using different 
subhalo finders (cf. Fig. 2). Observationally its uncertainties are 
currently dominated by incompleteness effects at the low mass end 
(e.g. Simon & Geha 2007a; Walsh et al. 2009). In any case, these 
uncertainties are unlikely to account for the disparity first pointed 
out by Klypin et al. (1999) and Moore et al. (1999) and still not fully 
explained. The disagreement between the theoretical and observed 
satellite mass function has driven a large number of studies attempt- 
ing to reduce this disparity using baryonic physics (e.g. Nickerson 
et al. 201 1), e.g. invoking stellar feedback (e.g. Mac Low & Ferrara 
1999), reionization (e.g. Bullock et al. 2000; Benson et al. 2002), 
and ram pressure stripping (e.g Mayer et al. 2006), or by invok- 
ing different dark matter models (e.g. Colin et al. 2000; Bode et al. 
2001; Knebe et al. 2002, 2008; Maccio & Fontanot 2010; Lovell 
et al. 2012; Maccio et al. 2012) which have stronger effects on the 
abundance of subhaloes than the scatter introduced by using differ- 
ent halo finders. 

However, the cumulative mass (or luminosity) function is not 
the sole place where tensions exist between theory and observa- 
tions. The properties of the most massive subhaloes extracted from 
numerical simulations of Milky Way like haloes are at odds with 
those observed (e.g. Boylan-Kolchin et al. 2011; di Cintio et al. 
2011; Boylan-Kolchin et al. 2012; Vera-Ciro et al. 2012; Di Cin- 
tio et al. 2012). Constraining the masses of individual Milky Way 
satellites is observationally challenging due to the large uncertain- 
ties in the 3D velocity dispersion and the low number of member 
stars for the ultra-faint dwarf galaxies (errors of order 50 per cent; 
Simon & Geha 2007a). Still, upcoming surveys like GAIA'' are ex- 
pected to improve both the completeness and the accuracy of the 
mass determination of Milky Way satellites. Using Jeans modeling 
with various anisotropy parameters Wolf et al. (2010) have shown 
that the mass within the 3D deprojected half light radius M1/2 is 
an accurate mass estimator for dispersion supported systems. How- 
ever, they still quote uncertainties of 10-20 per cent, well above the 
scatter in individual subhaloes from numerical simulations seen in 
this study and hence also here code scatter is not the dominant prob- 
lem. 

The orbits of satellites and their alignment with the surround- 
ing environment also offers an avenue for (computational) Near- 
Field Cosmology (e.g. Knebe et al. 2004; Kang et al. 2005; Zentner 
et al. 2005; Libeskind et al. 2005, 2007; Knebe et al. 2008; Libe- 
skind et al. 2010; Knebe et al. 2010; Deason et al. 2011; Lovell 
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et al. 201 1; Knebe et al. 201 1; Libeskind et al. 2012; Wang & White 
2012). However, knowing the accurate 6D position and velocity of 
a (Milky Way) satellite is crucial for any kind of orbit determina- 
tion (Lux et al. 2010) with the exception of known tidal streams 
(e.g. Law & Majewski 2010). The positions and bulk velocities of 
subhaloes are well constrained, with accuracies of ;g 1 per cent. 
This is significantly better than the current observational constraints 
for both the satellites within the Milky Way and Andromeda (e.g. 
Lux et al. 2010; Watkins et al. 2013) which are needed to test the 
significance of the recently proposed thin plane of co-rotating sub- 
haloes in Andromeda (Ibata et al. 2013) and the previously reported 
disk of satellites in the MW (Kroupa et al. 2005; Metz et al. 2008, 
2009). Additionally, determining the centre of a galaxy and how 
that corresponds to the host halo's centre is not so clear cut ob- 
servationally. Different observational methods will give different 
results, with these differences being more pronounced for irregu- 
lar galaxies, such as the LMC and SMC. For galaxies close to the 
detection limit false/non-detection of member stars add to the un- 
certainties in the position of the centre. For example the centering 
of the ultra faint Milky Way dwarf galaxies Simon & Geha (2007b) 
that contain ~ 100 stars is accurate to a few arc seconds. This is 
better for the smaller dwarf galaxies that contain more stars, e.g. 
ultra compact dwarfs. 

In any case, given the observational challenge to determine 
positions, velocities and mass (even for future missions such as 
GAIA) as well as the complexity of the (baryonic) physics involved 
in properly modeling satellite galaxies the differences found across 
subhalo finders are the smallest source of uncertainty. 

6.4 Streams 

Haloes contain not only bound subhaloes but a wealth of substruc- 
ture such as tidal debris from disrupted subhaloes. Streams and 
other unbound structures may constitute an important component 
of the make-up of a halo (e.g. Carollo et al. 2007; Helmi 2008). 
Observationally the past years have seen the discovery of tremen- 
dous amounts of such structures, primarily in the stellar compo- 
nent: the Orphan stream (Belokurov et al. 2007; Sales et al. 2008; 
Newberg et al. 2010), the Monoceros stream (Newberg et al. 2002; 
Ibata et al. 2003; Yanny et al. 2003, 2004; Grillmair 2006; Grill- 
mair et al. 2008; Casetti-Dinescu et al. 2010), the Sagittarius stream 
(Ibata et al. 2001), moving groups (e.g. Hercules Corona Borealis, 
Harrigan et al. 2010), the Hercules-Aquila cloud (Belokurov et al. 
2007), the Cetus polar stream (Newberg et al. 2009), the Virgo stel- 
lar stream (Newberg et al. 2002; Martinez-Delgado et al. 2007; Vi- 
vas et al. 2008; Casetti-Dinescu et al. 2009; Prior et al. 2009), the 
Virgo overdensity (Martinez-Delgado et al. 2007; Juric et al. 2008), 
and the Pisces overdensity (Sesar et al. 2007; Watkins et al. 2009; 
KoUmeier et al. 2009; Sesar et al. 2010). All this observational work 
raises the question of whether or not we can also find these struc- 
tures in cosmological simulations. 

Unfortunately, typical (sub-)halo finders cannot be used to de- 
tect streams in galaxy formation simulations as most are configura- 
tion/density based finders. Such codes effectively collect particles 
by searching for clustering in configuration-space and remove false 
positives using an unbinding procedure, converting them to pseudo 
phase-space finders. As streams will not appear as an overdensity 
in configuration-space, these finders simply cannot identify them, 
despite their pseudo phase-space nature. 

In order to detect streams, previous methods have focused on 
tracking the particles of an accreted halo in time. However, requir- 
ing temporal information severely limits the search for streams and 
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such techniques cannot be directly apphed to observational data 
sets, which only provide an instantaneous snapshot of 'particle' 
(star) positions. Moreover, particles originating from the same pro- 
genitor need not be dynamically related to one another, especially if 
the progenitor has completed many orbits in an evolving potential. 
Tracking necessitates the application of a dynamically motivated 
criterion on the particles as opposed to an energy based one. 

However, there are several promising avenues for detecting 
streams without requiring temporal information (Sharma & Stein- 
metz 2006; Diemand et al. 2008; Zemp et al. 2009; Ascasibar 2010; 
Elahi et al. 201 1), and we recently studied the performance of sev- 
eral substructure - where substructure refers to both subhaloes and 
streams - detectors in a separate paper (Elahi et al., submitted). By 
including velocity-space information in the initial particle collec- 
tion, an object finder could in principle detect streams. The compli- 
cation lies in the fact that due to their unbound nature, one cannot 
use unbinding to remove false positives. Naturally, due to the more 
complex phase-space volumes occupied by these structures, these 
methods may be worse than currently used particle tagging meth- 
ods at identify streams in cosmological simulations (e.g. Warnick 
et al. 2008; Cooper et al. 2010; Helmi et al. 2011; Rashkov et al. 
2012). The advantage of these methods is that they do not require 
numerous snapshots, and could potentially be transferred to stream 
detection in real data sets, e.g. from SDSS^ or the upcoming GAIA 
mission. The comparison of various snapshot-based stream detec- 
tors to a full tracking code revealed that basic properties of the total 
substructure distribution (mass, velocity dispersion, position) are 
recovered with a scatter of ~20 per cent; and tidal debris with pu- 
rities of ~50 per cent ~ where purity is defined as the fraction of 
particles in debris originating from a distinct progenitor halo (Elahi 
et al., submitted). 

The wealth of extra information provided by identifying the 
tidal features associated with a subhalo and tidal debris has a num- 
ber of applications. For instance, the orientation and shape of a sub- 
halos tidal features could be incorporated into semi-analytic models 
of galaxy formation to determine the morphology of the satellite 
galaxy. The velocity distribution of tidal debris may have signif- 
icant ramifications for direct dark matter detectors (e.g. Fairbairn 
& Schwetz 2009; Kuhlen et al. 2010, 2012, and see Section 6.6) 
We conclude that tidal debris fields and streams are an extremely 
important field of research in the near-future when observations 
will provide a means to fully facilitate them. Substructure finding 
is moving in the right direction with the first indications of success- 
fully utilizing stream properties as a possible discriminator between 
cosmological models, casting light on the nature of dark matter. 



6.5 Gravitational Lenslng 

Gravitational lensing is the astrophysical phenomenon whereby the 
propagation of light is affected by the distribution of mass in the 
universe. It therefore provides a unique and direct probe of the mat- 
ter distribution in and about cosmic structures such as dark matter 
haloes. Gravitational lensing actually comes in three flavours, i.e. 
strong, weak, and micro-lensing. Strong lensing leads to multiple 
images of a background sources whereas for weak lensing the field 
of the deflector is only strong enough to produce generic distortions 
detectable in a statistical sense. For historical reasons strong lens- 
ing events leading to very small angular separations between the 
multiple images (as produced by stars, for instance) is referred to 
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as micro-lensing and is not of immediate relevance for halo find- 
ing. The former two nevertheless are and hence shall be discussed 
further here. 

Strong Lensing In the strong lensing regime one can use the par- 
ticulars of the multiply-imaged background sources to infer the 
mass distribution of the lens. But this (iterative) process requires 
reliable models for these mass distributions which come primarily 
from simulations of cosmic structure formation. And these in turn 
made use of one or other halo finder to find the dark matter haloes in 
the first place. But it is apparent from our own and other studies that 
halo density profiles and the concentration-mass relation are sub- 
ject to biases introduced by the applied halo finder (e.g. Lukic et al. 
2009; More et al. 201 1; Falck et al. 2012; Bhattacharya et al. 201 1). 
Further, more sophisticated models have to drop the assumption of 
spherical symmetry and make use of the triaxial shape of the dark 
matter (e.g. Jing & Suto 2002; Oguri et al. 2005; Gavazzi 2005; 
Sereno & Zitrin 2012; Limousin et al. 2012) distribution. While 
there is no doubt that simulated (dark matter) haloes have triaxial 
shapes (Warren et al. 1992; AUgood et al. 2006; Hahn et al. 2007; 
Maccio et al. 2007; Kuhlen et al. 2007; Bett et al. 2007; Maccio 
et al. 2008; Knebe et al. 2008; Pereira et al. 2008; Knebe et al. 2008; 
Vera-Ciro et al. 201 1; Schneider et al. 2012) we have just seen that 
the scatter in the actual value of, for instance, the halo sphericity is 
at best as small as 5 per cent (cf. Fig. 6 and Table 4). And baryonic 
processes - routinely simulated these days too (cf. Section 2.7) - 
and the precise way how to measure shapes (cf. Zemp et al. 201 1) 
will certainly also influence the applicability of halo catalogues to 
strong lensing studies. 

While major improvements in the observations and numeri- 
cal simulations have not yet significantly alleviated the aforemen- 
tioned satellite crisis (cf. Section 6.3), new techniques - involving 
gravitational lensing - have been proposed for the indirect and di- 
rect detection of subhaloes, primarily studying flux ratio anoma- 
lies introduced by the presence of substructure (Mao & Schneider 
1998; Metcalf & Madau 2001 ; Dalai & Kochanek 2002; Koopmans 
2005; Vegetti & Koopmans 2009; Zackrisson & Riehm 2010; Met- 
calf & Amara 2012). However, the modeling of these substructures 
for gravitational lensing heavily depends on whether the simulation 
includes baryonic effects or not (Maccio et al. 2006; Xu et al. 2009) 
and certainly on the capabilities of the applied halo finder: will the 
finder be able to properly find all substructures? And this not only 
refers to bound subhaloes but also more diffuse streams that still 
might have a surviving core capable of strong lensing. Maybe the 
reported wnrfer-prediction of subhaloes in galaxy clusters in ACDM 
simulations (Maccio et al. 2006; Xu et al. 2009) is still related to 
halo finder issues? 

Weak Lensing Weak gravitational lensing has become one of the 
key probes of the cosmological model, dark energy, and dark mat- 
ter, providing insight into both the cosmic expansion history and 
large scale structure growth history (Kaiser & Squires 1993; Wilson 
et al. 1996b; Bartelmann & Schneider 2001; Schneider 2006; Das 
et al. 2012). Early work on measuring halo mass distributions using 
weak galaxy-galaxy lensing was performed by Kaiser & Squires 
(1993), Wilson et al. (1996b,a), Schneider & Bartelmann (1997), 
and Schneider & Rix (1997). Following these, Natarajan & Re- 
fregier (2000) proposed a technique for using weak gravitational 
lensing to measure the ellipticity of haloes. Recently a lot of effort 
has gone into the application of measuring the shear of the matter 
distribution by means of statistical distortions of the background 
images of galaxies (Mellier 1999; Refregier 2003; Schneider 2006; 
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Munshi et al. 2008). Examining the substructure content of galaxy 
clusters Natarajan et al. (2007) find good agreement between the 
distribution of substructure properties retrieved using their weak 
lensing analysis and those obtained from the Millennium simula- 
tion. And weak lensing is also being used to infer halo shapes both 
observationally (Natarajan & Refregier 2000; Hoekstra et al. 2004; 
Mandelbaum et al. 2006; Parker et al. 2007; van Uitert et al. 2012) 
as well as in simulations (Bett 2012). 

All this indicates that the same limitations coming from halo 
finders that apply to strong lensing will also affect the applications 
of weak lensing. In addition, the large-scale distribution of haloes 
will be relevant for the determination of cosmological parameters 
and hence problems in that area (cf. Section 6.2) naturally enter 
here, too. 



6.6 Dark Matter Detection 

Dark matter detection relies on two distinct avenues. Indirect 
searches are attempting to observe the possible secondary particles 
that originate from either the decay or the self-annihilation of dark 
matter particles. Direct detection experiments rely on identifying 
the nuclear recoil signature of a dark matter particle colliding with 
a target atom in the detector volume. Each of these methods are sen- 
sitive to the dark matter substructures present in dark matter haloes, 
though in different fashions. As the emission from dark matter de- 
cay or self-annihilation scale with the density and the square of the 
density respectively, subhaloes can enhance the signal relative to 
that predicted for a smooth dark matter halo (Stoehr et al. 2003; 
Diemand et al. 2006; Elahi et al. 2009; Kamionkowski et al. 2010; 
Pinzke & Pfrommer 2010; Maciejewski et al. 2011; Blanche! & 
Lavalle 2012; Gao et al. 2012). But the measured contribution of 
resolved substructure is affected by both the numerical resolution 
of the underlying simulation and the capability to correctly identify 
substructure. Direct detection is extremely sensitive to the local ve- 
locity distribution of dark matter, thus both the presence of bound 
subhaloes and unbound tidal streams can significantly distort the 
signals observed by these detectors. Consequently, for a theoreti- 
cal modeling of the expected signal, an accurate determination of 
the full substructure distribution function, from bound subhaloes 
to tidal debris, and how substructure alters the density profile and 
velocity distribution of a halo is very important. 

While we have seen here and in previous works (Onions et al. 
2012, 2013; Knebe et al. 2013) how different finders perform with 
respect to the identification of (sub-)haloes, a comparison of the 
radial profile of the respective (sub-)haloes would be required to 
shed light into the subject of dark matter detection: while differ- 
ent finders might in fact find the same objects with comparable 
masses, are the associated particles distributed in the same way? 
While one naively might answer 'yes' to this question, it should not 
be taken for granted. It all comes down again to the particle collec- 
tion method and unbinding procedure. For instance, different ways 
of calculating the reference escape velocity as a function radius 
during the unbinding might lead to preferential removal/keeping of 
particles in certain regions like the centre or the outskirts and vice 
versa. 

However, we are not arguing against the established notion 
that dark matter haloes follow a universal density profile (first re- 
ported by Navarro et al. 1995, 1996, 1997). But even this univer- 
sality has been questioned with the first indications of scatter in the 
profiles between haloes by Jing & Suto (2000) and Bullock et al. 
(2001). Presently the question of the precise value of the logarith- 
mic central slope is still debated - a question related to the 'cusp- 



core crisis of ACDM', i.e. the discrepancy between cuspy profiles 
as predicted by simulations and cored profiles as derived observa- 
tionally (see de Blok 2010, for a recent review). All we can do 
at the moment is to alert the reader to the fact that any possible 
dependence of the radial halo profile (and in particular the much 
sought-after central slope) on the applied halo finder has not been 
tested yet. As putative as this might be, it cannot be ruled out at this 
stage. 

Further, the inclusion of baryons in simulations as well as dur- 
ing the halo finding process will certainly alter the (central) den- 
sity profile (Blumenthal et al. 1986; Tissera & Dominguez-Tenreiro 
1998; Oilorbe et al. 2007; Romano-Diaz et al. 2008, 2009; Libe- 
skind et al. 2010; Duffy et al. 2010; Sommer-Larsen & Limousin 
2010; Brooks & Zolotov 2012; Di Cintio et al. 2012; Zemp et al. 
2012). But the uncertainties in the modeling of baryonic processes 
will likely leave us with a larger scatter than finder-to-finder varia- 
tions and hence should be of greater concern in the end. 

6.7 Modified Gravity Simulations 

Though possibly less developed than ACDM simulations, there 
now exist several suites of A'^-body simulations that solve the grav- 
itational evolution of particles according to a particular model that 
modifies general relativity (GR) as an alternative to dark energy 
(e.g. Schmidt et al. 2009; Zhao et al. 2011a,b; Li et al. 2012) or 
assume some other form of modifications to gravity and/or the ex- 
pansion of the Universe (e.g. Knebe & Gibson 2004; Llinares et al. 
2008; Li & Zhao 2009; Hellwing et al. 2010; Zhao et al. 2010; Car- 
lesi et al. 2011; Li & Barrow 2011; Carlesi et al. 2012; Hellwing 
et al. 201 1 ; Winther et al. 2012; Baldi 2012b). Currently, these stud- 
ies primarily focus on the (sub-)halo mass function, particularly the 
high mass end, using the frequency of massive bound objects as dis- 
criminators for these models (e.g. Lo Verde 8l Smith 2011; Hoyle 
et al. 201 1; Carlesi et al. 201 1; Baldi 2012a). 

However, along with the production simulation code the halo 
finding algorithm also requires adjustment in compliance with the 
adopted non-standard model: the calculation of many halo proper- 
ties (for example virial radius, rotation curve and thus Knax, spin, 
and concentration) assume general relativity (GR) and so must be 
modified for modified gravity (MG) simulations according to the 
specific model. This is obviously best left to the users and not 
the code providers to derive, but it is important that enough GR- 
independent (i.e. dynamical) parameters are included in halo cata- 
logs to allow for calibration. This also means that results for MG 
simulations depend strongly on the unbinding procedure, which 
itself depends on environment. For example, in most MG mod- 
els gravity is very different for field haloes and haloes in low- 
density environments, so the unbinding procedures in the above 
codes would be invalid, although, as noted in Section 4.1, the un- 
binding procedure has little impact on the mass of field haloes. In 
clusters, GR is recovered in most models, so in principle the un- 
binding would be the same. Regardless of the details, the unbinding 
procedure in MG models must be addressed as we saw in Section 
5.1.3 the absence of unbinding leads to large scatter in the subhalo 
mass functions and without it, a number of physical parameters, 
such as spin, are poorly recovered. 

6.8 Summary 

Fortunately, we find that most applications of (sub-)halo catalogues 
are not affected by the uncertainties discussed here. However, fu- 
ture goals of improved precision will demand higher accuracy in 
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the determination of (sub-)halo properties (see discussion in Sec- 
tion 5). It should be emphasized that the potential errors introduced 
by halo finders, as well as the interplay with observable quantities, 
come with the proviso that the baryonic physics and the biases and 
changes it introduces are well understood. This is by no means the 
case, but given that baryons are unlikely to drastically alter the per- 
formance of current halo finders or alter their systematic biases, 
we argue that testing and improving these object finders using pure 
dark matter simulations is sufficient for the time being. 



7 SUMMARY & CONCLUSIONS 

With the ever increasing size and complexity of fully self-consistent 
simulations of cosmic structure formation, the demands upon ob- 
ject finders for these simulations has simultaneously grown. These 
codes not only need to locate haloes residing in the cosmic web, 
they are also often required to (correctly) identify substructure liv- 
ing in the inhomogeneous background of their host haloes. The last 
decade, and in particular the last couple of years, has seen an im- 
mense boost in the number of techniques and codes specifically 
developed for these tasks. But while a lot of effort has been put into 
validating the results coming from different simulation codes (e.g. 
Frenk et al. 1999; Knebe et al. 2000; O'Shea et al. 2005; Agertz 
et al. 2007; Heitmann et al. 2008; Tasker et al. 2008), until re- 
cently it was unclear how different structure finders compare. To 
this extent we initiated the Halo Finder Comparison Project that 
gathered together all experts in the field and has so far led to a se- 
ries of comparison papers (Knebe et al. 201 1; Onions et al. 2012, 
2013; Knebe et al. 2013, Elahi et al. submitted), emerging out of 
two workshops.* 

In this overview article, we summarise the results of both 
workshops (partially published in the two previous comparison pa- 
pers, Knebe et al. (2011) and Onions et al. (2012)) and take the 
analysis one step further. We aimed at familiarizing the reader with 
the general concepts commonly applied for halo finding clearly 
separating methods from definitions: the method is intrinsic to the 
code whereas the definition of a given halo property should be in- 
dependent of the applied finder. While both - method and definition 
- should be physically motivated, the latter certainly is less techni- 
cal and should be made abundantly clear to any end-user of halo 
finders. For instance, the same matter distribution describing a dark 
matter halo could be assigned different masses if the definition for 
the edge (and hence mass) is not identical. Or put differently, even if 
two finders consider the same particles belonging to an object, they 
may still write different masses to their halo catalogue depending 
on the assumed values for A^cf and/or p^cf of Eq. (2); or they may 
even apply another mass/edge definition. 

Any further differences between finders now originate from 
various error sources, primarily the following two: 

• codes recover different values for the same property (even 
when using the same definition), and 

• codes find different (numbers of) objects. 

While all previous comparison projects mainly dealt with distribu- 
tion functions of properties, we were so far unable to disentangle 
the relative strength of these two sources. Here (i.e. in Sees. 4.2. 1 
through 4.2.5) we now focused on the former by restricting the 

* "Haloes going MAD" (http://popia.ft.uam.es/ 

HaloesGoingMAD) and "Subhaloes going Notts" (http: 
//popia. ft .uam.es/SubhaloesGoingNotts) 



analysis to only those objects that had been found by all finders. 
We further utilized a common post-processing pipeline that only 
deals with particle ID lists as returned by each code and calcu- 
lates all halo properties in the same manner to avoid contamina- 
tion from different definitions. Our results are best summarized in 
Table 4 indicating that for the most basic properties (i.e. position, 
velocity, Vinax. and possibly mass) and structures that are found by 
all participating finders the agreement across codes is at the 1 per 
cent level, sufficient for the so-called era of 'Precision Cosmology' 
(Smoot 2003; Primack 2005; Coles 2005; Primack 2007). Note, 
that these errors have been derived as lower limits for subhalo cata- 
logues based upon a common subset of objects and post-processing 
pipeline avoiding scatter from various distributions. On the other 
hand, finding distinct haloes rather than substructure is less chal- 
lenging and the errors can be expected to be of this order. 

More involved halo properties such as spin parameter suffer 
from a larger scatter (see also Onions et al. 20 1 3). However, we also 
need to acknowledge that the general (sub-)halo mass and Vmax- 
functions including all objects identified by each finder suffer from 
a larger scatter than given in Table 4. This is accounted for (though 
not explained) by different numbers of objects: while our common 
set consists of some 800 haloes. Table 3 clearly shows that there 
exist of the same order of objects not found by all the other codes. 
These 'excess' objects then lead to an upwards boost in the (cu- 
mulative) distribution functions. We have shown (Fig. 8) that the 
majority of these missing objects are small, containing less than 
a few hundred particles in general. Further, they occur at all radii 
and are not predominantly near the centre of the halo. We therefore 
suggest that if a well matched, high purity catalogue is required for 
the scientific study in question that a higher particle limit than the 
usual 20 is required and that adopting 300 particles, the limit usu- 
ally suggested for obtaining stable derived halo properties such as 
the spin parameter would be a good idea. 

We further investigated the possible origin of the differences 
in halo properties when using different finders by trying to decode 
the influence of varying methodologies in Section 5. We found that 
both the collection of particles and the particulars of the unbinding 
procedure have an impact. In particular, some codes return unbound 
particles to the pool of all particles to be considered bound to any 
other object whereas other codes completely remove unbound par- 
ticles from the set. However, the characteristics of how to obtain 
the potential entering the formula for the escape velocity during 
the unbinding appears to have only marginal effects. 

But this also brings us back to (some of) the points raised 
in Section 2.5 and discussed during the last "Subhaloes going 
Notts" workshop: what is the proper definition for the mass of a 
halo? Practically all codes prune their initial particle collection by 
some sort of unbinding procedure. But for subhaloes, for instance, 
'boundness' may not be that well-defined; and remember, all codes 
extract the subhalo particles and remove unbound particles as if the 
object were in isolation. But what is the right way to treat subhaloes 
then and remove particles that do not belong to it? One of the bullet 
points in Section 2.5 was to define objects in general (and not only 
subhaloes) dynamically, i.e. only those particles that stay with the 
halo over (at least) a dynamical time should be considered 'bound'. 
These thoughts naturally lead to potential refinements of halo find- 
ing techniques. 

Possible improvements have also been briefly touched upon 
in this article pointing towards halo tracking methods: most work- 
ers in the field and end-users of halo catalogues, respectively, are 
not only interested in single temporal snapshots of a simulation but 
also like to trace objects backwards (or forward) in time. And it has 



© 2010 RAS, MNRAS 000, 1^2 



28 Knebe et. al 



recently been shown that this approach can be used to actually 'cor- 
rect' halo catalogues (Behroozi et al. 201 1). Further, one of the halo 
finders presented here is in fact based upon this approach (HBT, 
Han et al. 2012). We reiterate that basically all the results presented 
here and elsewhere are based upon a single snapshot analysed at 
redshift z = Q. We leave comparisons at higher redshift where, for 
instance, mergers play a major role, to a future study/workshop. 

We closed the paper with a discussion of the relevance of halo 
finding for various astrophysical applications such as galaxy for- 
mation (either semi-analytical modeling or direct simulation), the 
interpretation of large-scale structure surveys, near-field cosmol- 
ogy, gravitational lensing, dark matter detection, (stellar) streams, 
and modified gravity models. While the requirements are quite di- 
verse we nevertheless conjecture that intrinsic uncertainties in the 
respective application might be larger than variations introduced 
by using different halo finders. For example, the adoption of differ- 
ent sub-grid physics to model galaxy formation will certainly lead 
to more pronounced variations in the final results than changing 
the halo finder for a given model. Nevertheless, we are not claim- 
ing that the observed finder-to-finder scatter should be neglected: 
only with credible and reliable halo catalogues can we adequately 
(and scientifically) address the open questions. And there appears 
to remain some work to be done to fully align the outcomes of the 
different halo finders. 

We conclude that while the agreement across different halo 
finding techniques is converging towards the requirements for pre- 
cision cosmology, there is still room for improvement. It remains 
unclear where part of the observed scatter stems from and why all 
finders do not find the same objects. We aim to address these issues 
at the next halo finder comparison workshop, due to take place in 
2014. 
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APPENDIX A: CODE PERFORMANCE 

One question that gets repeatedly asked is the relative speed of the 
different codes. Table Al gives the time taken in seconds to analyse 
the Aquarius A-5 and A-4 datasets at redshift 2; = on the same 
dedicated compute server. All codes used one thread and the same 
compiler. These numbers should be taken with a large pinch of salt. 
Por instance, HBT, a halo tracking finder, needs to analyse multiple 
outputs to produce results (even though only the time for the 2 = 
analysis is reported here) and VOBOZ timings are dependent on 
the size of the surrounding region chosen. This included the entire 
simulation in the A-5 case, but subregions were analyzed in other 
cases. Mendieta also returned timing information, but only after 
the workshop (i.e. not run on the same machine as the other codes) 
stating that A-5 (A-4) took 328 (25844) seconds on a Xeon R5520 
cpu using only one core. Aquarius A-4 contains roughly 8 times 
more particles than Aquarius A-5, and all the codes returning data 
appear to scale relatively well. The take home message is simple: 
for a wide variety of theoretical approaches it is possible to extract 
a good subhalo catalogue from the Aquarius A-5 dataset in a few 
minutes, whereas for the A-4 dataset this process takes of order an 
hour even on a single processor. 



APPENDIX B: CROSS-CORRELATIONS 

Bl Procedure to obtain the common set 

In order to identify the common set of objects used in Section 4 
haloes were required to be within 250 kpc/h of the fiducial host 
center x = 57060.4/i"^ kpc, y = 52618. 6/i"^ kpc/h, z = 
48704. 8ft~^ kpc and to have more than 20 particles. Ror the pur- 
poses of the analysis in this paper, the counterpart to halo Ai in 
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Table Al. Timing results (in seconds) for analysing both Aquarius A-5 and A-4 at redshift 2 = on a dedicated compute server (all using one thread and the 
same compiler. 



Data AdaptaHOP AHF GRASSHOPPER HBT ROCKSTAR STF SUBFIND VOBOZ 



A-5 
A-4 



170 42 

1966 359 



108 
997 



— 24 52 209 3533 

2940 227 503 2238 7469 



catalog 1 from all haloes B in catalog 2 is computed by finding 
that object Bi in catalog 2 which maximizes a merit function Mi 



M, 



Na^Nb, 



(Bl) 



where NaiBi is the number of shared particles between halo Ai 
and halo Bi, Nai the number of particles in Ai and Nb^ the num- 
ber of particles in Bi. 

For each of the nine considered finders for this exercise a list 
containing the matches to each of the remaining eight finders has 
been generated. We then cross-references these lists restricting the 
analysis only to the set of objects found by every halo finder, i.e. a 
table has been created listing only those objects for each finder that 
form part of the common set. 



B2 Alternative Merit Functions 

The merit function deployed in this paper is based upon the 
MergerTree tool from the AHF halo finder distribution (Knoll- 
mann & Knebe 2009) and has been previously used successfully 
(Klimentowski et al. 2010; Libeskind et al. 2010). To verify the 
suitability of this choice we present a brief comparison of alterna- 
tive merit functions. These are detailed in Table Bl and their suc- 
cess at maximizing the set of of counterparts is summarized in Ta- 
ble B2. Note that for this exercise we worked with the Aquarius A-5 
data set and hence the lower numbers than those given in Table 3. 
We can see that our standard merit function Mi and the somewhat 
simpler A/" have exactly the same performance; indeed their global 
set of common objects identified is identical. This is due to the 
fact that the host halo is not included in the cross-correlation: the 
normalisation to NaiNbi in Mi is being used to avoid matching 
subhaloes to their host in case of inclusive particle ID lists. Other, 
more complicated metrics only degrade performance. We note that 
the common set of objects identified by M^ and M[ are proper 
subsets of the global set of common objects identified by Mi. And 
please note that the rather low number for the common set is de- 
termined by HOT3D that only found a total of 58 subhaloes ([cf. 
Table 1 in Onions et al. 2012). 



APPENDIX C: CODE DESCRIPTIONS 

While the information presented here can be found in various other 
publications, we nevertheless considered it helpful to compile it in 
one single place here. We are giving here a very brief and hope- 
fully concise description of all those halo finders that participated 
in any of the comparison projects "Haloes going MAD" (Knebe 
et al. 2011), "Subhaloes going Notts" (Onions et al. 2012), and 
"Galaxies going MAD" (Knebe et al. 2013). And to facilitate the 
understanding of the mode of operation of each of the codes, we 
present a brief summary in Table CI, too. 



Table CI. Brief summary of the codes listing the dimensionality of the 
(primary) metric to find objects, the assumed geometry, whether the code 
features an unbinding procedure and can sufficiendy handle subhalo detec- 
tion. 



code 


metric 


geometry 


unbinding 


subhaloes 


6DFOF 


6D 


arbitrary 


no 


yes 


AdaptaHOP 


3D 


spherical 


no 


yes 


AHF 


3D 


spherical 


yes 


yes 


ASOHF 


3D 


spherical 


yes 


yes 


BDM 


3D 


spherical 


yes 


yes 


FOF 


3D 


arbitrary 


no 


limited 


GRASSHOPPER 


3D 


spherical 


yes 


yes 


HBT 


3D+time 


arbitrary 


yes 


yes 


HOT 


3D/6D 


arbitrary 


no 


yes 


HSF 


6D 


arbitrary 


yes 


yes 


JUMP-D 


3D 


spherical 


no 


yes 


LANL 


3D 


spherical 


no 


no 


Mendieta 


3D 


arbitrary 


yes 


yes 


NTROPYFOF 


3D 


arbitrary 


no 


no 


ORIGAMI 


3D 


arbitrary 


yes 


no 


pfof 


3D 


arbitrary 


no 


no 


pso 


3D 


spherical 


no 


no 


ROCKSTAR 


6D 


arbitraiy 


yes 


yes 


SKID 


3D 


spherical 


yes 


yes 


STF 


6D 


arbitrary 


yes 


yes 


SUBFIND 


3D 


arbitrary 


yes 


yes 


VOBOZ 


3D 


arbitraiy 


yes 


yes 



CI 6DFOF 

6DF0F is a simple extension of the well known FOF method 
which also includes a proximity condition in velocity space. Since 
the centres of all resolved haloes and subhaloes reach a similar peak 
phase space density they can all be found at once with 6DFOF. The 
algorithm was first presented in Diemand et al. (2006). The 6DFOF 
algorithm links two particles if the following condition 



(xi - X2) 
Aa;2 



+ 



(vi - V2) 

Au2 



< 1 



(CI) 



is fulfilled. There are three free parameters: Ax, the linking length 
in position space, Av, the linking length in velocity space, and 
A^'min, the minimum number of particles in a linked group so that 
it will be accepted. For Av — >■ c» it reduces to the standard FOF 
scheme. The 6DFOF algorithm is used for finding the phase space 
coordinates of the high phase space density cores of haloes on all 
levels of the hierarchy and is fully integrated in parallel within the 
MPI and OpenMP parallehsed code PKDGRAV (Stadel 2001). 

The centre position and velocity of a halo are then determined 
from the linked particles of that halo. For the centre position of 
a halo, one can choose between the following three types: 1) the 
centre-of-mass of its linked particles, 2) the position of the parti- 
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Table Bl. Counterpart merit functions. Note that some are combination of others and hence we show in bold-face those entering the actual comparison 
presented in Table B2. 



meiit function formula 



description 



Mi 

Mf 
Mf 

MV'(P) 

M?(P) 
MfiP) 

Mf 

Mf 
Mf 
Mf 
Mf 
M» 
Mf 
Mf 



Pa,b, = \Pa^ -PbA 



PaiB, 

'^PAiBi = 

Pai+Pb, 

PAiBi 



{PA^-PA^Bir + iPB^-PA^Bir 
2 



Mi + Mf 

Mi + Mf («„ax) 
Mf +Mf(l«max) 

E 



-^peiv, 



max, mass, 5 



„jMf(p) 
} Mf (p) 



^P6{fmax,mass,nvpart,6.c 

M, + Mf + M^^(v^.,„) 



normalized shared particles 

shared particles 

r is radial distance from Ai to Bi 

property P difference 

property P average 

property P standard deviation 

average property P value 

normalized property P value average 

downweighted property P comparison 

downweighted radius r'^ comparison 



Table B2. Number of haloes in the common set and found in excess of it for the Aquarius A-5 data set. 



metric 


common objects 










excess 


objects 












AHF 


HOT3D 


HOT6D 


HBT 


HSF 


ROCKSTAR 


STF 


SUB FIND 


VOBOZ 


M, 


39 


191 


19 


97 


189 


192 


233 


106 


175 


218 


Mf 


39 


191 


19 


97 


189 


192 


233 


106 


175 


218 


Mf 


18 


212 


40 


118 


210 


213 


254 


187 


196 


239 


Ml 


18 


212 


40 


118 


210 


213 


254 


187 


196 


239 


Mf 


19 


211 


39 


117 


209 


212 


253 


186 


195 


238 


Mf 


18 


212 


40 


118 


210 


213 


254 


187 


196 


239 


M" 


18 


212 


40 


118 


210 


213 


254 


187 


196 


239 


Mf 


1 


229 


57 


135 


227 


230 


271 


204 


213 


256 


Mf 


1 


229 


57 


135 


227 


230 


271 


204 


213 


256 


m/ 


1 


229 


57 


135 


227 


230 


271 


204 


213 


256 


Mf 


1 


229 


57 


135 


227 


230 


271 


204 


213 


256 



cle with the largest absolute value of the potential among its linked 
particles or 3) the position of the particle which has the largest local 
mass density among its linked particles. For the analysis presented 
here, we chose type 3) as our halo centre position definition. The 
centre velocity of a halo is calculated as the centre-of-mass veloc- 
ity of its linked particles. Since in 6DFOF only the particles with 
a high phase space density in the very centre of each halo (or sub- 
halo) are linked together, it explains the somewhat different halo 
velocities (compared to the other halo finders) and slightly offset 
centres in cases only a few particles were linked. 

Other properties of interest (e.g. mass, size or maximum of 
the circular velocity curve) and the hierarchy level of the individual 
haloes are then determined by a separate profiling routine in a post 
processing step. For example, a characteristic size and mass scale 
definition (e.g. r-2ooc and Af200c) for field haloes based on tradi- 
tional spherical overdensity criteria can be specified by the user. 
For subhaloes, a truncation scale can be estimated as the location 



where the mass density profile reaches a user specified slope. Dur- 
ing the profiling step no unbinding procedure is performed. Hence, 
the profiling step does not base its (sub-)halo properties upon parti- 
cle lists but rather on spherical density profiles. Therefore, 6DFOF 
directly returned halo properties instead of the (requested) particle 
ID hsts. 



C2 AdaptaHOP 

The code AdaptaHOP is described in Appendix A of Aubert et al. 
(2004). The first step is to compute an SPH density for each particle 
from the 20 closest neighbours. Isolated haloes are then described 
as groups of particles above a density threshold pt, where this pa- 
rameter is set to 80, which closely matches results of a FOF group 
finder with parameter b = 0.2. To identify subhaloes within those 
groups, local density maxima and saddle points are detected. Then, 
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by increasing the density threshold, it is a simple matter to decom- 
pose haloes into nodes that are either density maxima, or groups of 
particles whose density is between two values of saddle points. A 
node structure tree is then created to detail the whole structure of 
the halo itself. Each leaf of this tree is a local density maximum and 
can be interpreted as a subhalo. However, further post-processing 
is needed to define the halo structure tree, describing the host halo 
itself, its subhaloes and subhaloes within subhaloes. This part of 
the code is detailed in Tweed et al. (2009); the halo structure tree 
is constructed so that the halo itself contains the most massive lo- 
cal maximum (Most massive Sub maxima Method: MSM). This 
method gives the best result for isolated snapshots, as used in this 
paper. 

In more detail, AdaptaHOP needs a set of seven parameters. 
The first parameter is the number of neighbours n„ei used with a 
fcD-tree scheme in order to estimate the SPH density. Among these 
n„ei neighbours, the n^op closest are used to sweep through the 
density field and detect both density maxima and saddle points. 
As previously mentioned, the parameter pt sets the halo bound- 
ary. The decomposition of the halo itself into leaves that are to 
be redefined as subhaloes has to fulfil certain criteria set by the 
remaining four parameters. The most relevant is the statistical 
significance threshold, set via the parameter fudge, defined via 
((p) ~ Pt)l pt > fudge/ \^, where A'' is the number of particles 
in the leaves. The minimal mass of a halo is limited by the param- 
eter n^embers, the minimum number of particles in a halo. Any 
potential subhalo has also to respect two conditions with respect to 
the density profile and the minimal radius, through the parameters 
a and /e. These two values ensure that a subhalo has a maximal 
density pmax such as pmax > ct{p) and a radius greater than /j 
times the mean interparticle distance. We used the following set of 
parameters (n„ei ~ riuop ~ 20, pt = 80, fudge — A, a — 1, 
/e = 0.05, nmembeis = 20). It Is important to understand that all 
nodes are treated as leaves and must comply with aforementioned 
criteria before being further decomposed into separate structures. 
As for defining haloes and subhaloes themselves, this is done by 
grouping linked lists of particles corresponding to different nodes 
and leaves from the node structure tree. Further, the halo and sub- 
halo centres are defined as the position of the particle with the high- 
est density. The halo edge corresponds to the pt density threshold, 
whereas the saddle points define the subhalo edge. 

Please note that AdaptaHOP is a mere topological code that 
does not feature an unbinding procedure. For substructures (whose 
boundaries are chosen from the saddle point value) this may impact 
on the estimate of the mass as well as lead to contamination by host 
particles. 



C3 AHF 

The MPI-l-OpenMP parallelised halo finder AHF^ (AMIGA Halo 
Finder, Knollmann & Knebe 2009), is an improvement of the MHF 
halo finder (Gill et al. 2004), which employs a recursively refined 
grid to locate local overdensities in the density field. The identi- 
fied density peaks are then treated as centres of prospective haloes. 
The resulting grid hierarchy is further utilized to generate a halo 
tree readily containing the information which halo is a (prospec- 
tive) host and subhalo, respectively. We therefore like to stress that 
our halo finding algorithm is fully recursive, automatically iden- 
tifying haloes, sub-haloes, sub-subhaloes, etc. Halo properties are 



AHF is freely available from http : //www.popia . ft .uam. es/AHF 



calculated based on the list of particles asserted to be gravitationally 
bound to the respective density peak. To generate this list of parti- 
cles we employ an iterative procedure starting from an initial guess 
of particles. This initial guess is based again upon the adaptive grid 
hierarchy: for field haloes we start with considering all particles out 
to the iso-density contour encompassing the overdensity defined by 
the virial criterion based upon the spherical top-hat collapse model; 
for subhaloes we gather particles up to the grid level shared with an- 
other prospective (sub-)halo in the halo tree which corresponds to 
the upturn point of the density profile due to the embedding within 
a (background) host. This tentative particle list is then used in an 
iterative procedure to remove unbound particles: In each step of the 
iteration, all particles with a velocity exceeding the local escape ve- 
locity, as given by the potential based on the particle list at the start 
of the iteration, are removed. The process is repeated until no parti- 
cles are removed anymore. At the end of this procedure we are left 
with bona fide haloes defined by their bound particles and we can 
calculate their integral and profiled quantities. 

The only parameter to be tuned is the refinement criterion used 
to generate the grid hierarchy that serves as the basis for the halo 
tree and also sets the accuracy with which the centres are being de- 
termined. The virial overdensity criterion applied to find the (field) 
halo edges is determined from the cosmological model of the data 
though it can readily be tailored to specific needs; for the analy- 
sis presented here we used 200 x pcrit- For more details on the 
mode of operation and actual functionality we refer the reader to 
the two code description papers by Gill et al. (2004) and KnoU- 
maim & Knebe (2009), respectively. 

C4 ASOHF 

The ASOHF finder (Planelles & Quihs 2010) is based on the 
spherical overdensity (SO) approach. Although it was originally 
created to be coupled to an Eulerian cosmological code, in its ac- 
tual version, it is a stand-alone halo finder capable of analysing the 
outputs from cosmological simulations including different compo- 
nents (i.e., dark matter, gas, and stars). The algorithm takes advan- 
tage of an AMR scheme to create a hierarchy of nested grids placed 
at different levels of refinement. All the grids at a certain level, 
named patches, share the same numerical resolution. The higher 
the level of refinement the better the numerical resolution, as the 
size of the numerical cells gets smaller. The refining criteria are 
open and can be chosen depending on the application. For a gen- 
eral purpose, ASOHF refines when the number of particles per cell 
exceeds a user defined parameter. Once the refinement levels are set 
up, the algorithm applies the SO method independently at each of 
those levels. The parameters needed by the code are the following: 
i) the cosmological parameters when analysing cosmological sim- 
ulations, ii) the size of the coarse cells, the maximum number of 
refinement levels (Ni^^eis), and the maximum number of patches 
(Npatch) for all levels in order to build up the AMR hierarchy of 
nested grids, iii) the number of particles per cell in order to choose 
the cells to be refined, and iv) the minimum number of particles in 
a halo. 

After this first step, the code naturally produces a tentative list 
of haloes of different sizes and masses. Moreover, a complete de- 
scription of the substructure (haloes within haloes) is obtained by 
applying the same procedure on the different levels of refinement. 
A second step, not using the cells but the particles within each halo, 
makes a more accurate study of each of the previously identified 
haloes. These prospective haloes (subhaloes) may include particles 
which are not physically bound. In order to remove unbound par- 



(c) 2010 RAS, MNRAS 000, 1^2 



Structure Finding in Cosmological Simulations 37 



tides, the local escape velocity is obtained at the position of each 
particle. To compute this velocity we integrate Poisson equation 
assuming spherical symmetry. If the velocity of a particle is higher 
than the escape velocity, the particle is assumed to be unbound and 
is therefore removed from the halo (subhalo) being considered. Fol- 
lowing this procedure, unbound particles are removed iteratively 
along a list of radially ordered particles until no more of them need 
to be removed. In the case that the number of remaining particles is 
less than a given threshold the halo is dropped from the list. 

After this cleaning procedure, all the relevant quantities for 
the haloes (subhaloes) as well as their evolutionary merger trees 
are computed. The lists of (bound) particles are used to calculate 
canonical properties of haloes (subhaloes) like the position of the 
halo centre, which is given by the centre of mass of all the bound 
particles, and the size of the haloes, given by the distance of the 
farthest bound particle to the centre. 

The ability of the ASOHF method to find haloes and their 
substructures is limited by the requirement that appropriate refine- 
ments of the computational grid exist with enough resolution to 
spot the structure being considered. In comparison to algorithms 
based on linking strategies, ASOHF does not require a linking 
length to be defined, although at a given level of refinement the size 
of the cell can be considered as the linking length of this particular 
resolution. 

The version of the code used in this comparison is serial, al- 
though there is already a first parallel version based on OpenMP. 

C5 BDM 

The Bound Density Maxima (BDM) halo finder originally de- 
scribed in Klypin &. Holtzman (1997) uses a spherical 3D overden- 
sity algorithm to identify haloes and subhaloes. It starts by finding 
the local density at each individual particle position. This density 
is defined using a top-hat filter with a constant number of particles 
A^fiiter, which typically is A^fliter = 20. The code finds all maxima 
of density, and for each maximum it finds a sphere containing a 
given overdensity mass A/a = (47r/3)A/9cr-RA> where p^ is the 
critical density and A is the specified overdensity. 

For the identification of distinct haloes, the code uses the den- 
sity maxima as halo centres; amongst overlapping sphere the code 
finds the one that has the deepest gravitational potential. Haloes are 
ranked by their (preliminary) size and their final radius and mass 
are derived by a procedure that guarantees smooth transition of 
properties of small haloes when they fall into a larger host) halo 
becoming subhaloes: this procedure either assigns Ra or i?dist as 
the radius for a currently infalling halo as its radius depending on 
the environmental conditions, where i?dist measures the distance 
of the infalling halo to the surface of the soon-to-be host halo. 

The identification of subhaloes is a more complicated proce- 
dure: centres of subhaloes are certainly density maxima, but not all 
density maxima are centres of subhaloes. BDM eliminates all den- 
sity maxima from the list of subhalo candidates which have less 
than Nfatcr self-bound particles. For the remaining set of prospec- 
tive subhaloes the radii are determined as the minimum of the fol- 
lowing three distances: (a) the distance to the nearest barrier point 
(i.e. centres of previously defined (sub-)haloes), (b) the distance 
to its most remote bound particle, and (c) the truncation radius (i.e. 
the radius at which the average density of bound particles has an in- 
flection point). This evaluation involves an iterative procedure for 
removing unbound particles and starts with the largest density max- 
imum. 

The unbinding procedure requires the evaluation of the gravi- 



tational potential which is found by first finding the mass in spheri- 
cal shells and then by integration of the mass profile. The binning is 
done in log radius with a very small bin size of A log(i?) = 0.005. 

The bulk velocity of either a distinct halo or a subhalo is de- 
fined as the average velocity of the 30 most bound particles of that 
halo or by all particles, if the number of particles is less than 30. 
The number 30 is a compromise between the desire to use only the 
central (sub)halo region for the bulk velocity and the noise level. 

The code uses a domain decomposition for MPI paralleliza- 
tion and OpenMP for the parallelization inside each domain. 

C6 FOF 

In order to analyse large cosmological simulations with up to 2048"^ 
particles we have developed a new MPI version of the hierarchical 
Friends-Of-Friends algorithm with low memory requests. It allows 
us to construct very fast clusters of particles at any overdensity 
(represented by the linking length) and to deduce the progenitor- 
descendant-relationship for clusters in any two different time steps. 
The particles in a simulation can consist of different species (dark 
matter, gas, stars) of different mass. We consider them as an undi- 
rected graph with positive weights, namely the lengths of the seg- 
ments of this graph. For simplicity we assume that all weights are 
different. Then one can show that a unique minimum spanning tree 
(MST) of the point distribution exists, namely the shortest graph 
which connects all points. If subgraphs cover the graph then the 
MST of the graph belongs to the union of MSTs of the subgraphs. 
Thus subgraphs can be constructed in parallel. Moreover, the geo- 
metrical features of the clusters, namely the fact that they occupy 
mainly almost non-overlapping volumes, allow the construction of 
fast parallel algorithms. If the MST has been constructed all pos- 
sible clusters at all linking lengths can be easily determined. To 
represent the output data we apply topological sorting to the set of 
clusters which results in a cluster ordered sequence. Every cluster 
at any linking length is a segment of this sequence. It contains the 
distances between adjacent clusters. Note, that for the given MST 
there exist many cluster ordered sequences which differ in the or- 
der of the clusters but yield the same set of clusters at a desired 
linking length. If the set of particle-clusters has been constructed 
further properties (centre of mass, velocity, shape, angular momen- 
tum, orientation etc.) can be directly calculated. Since this concept 
is by construction aspherical a circular velocity (as used to char- 
acterise objects found with spherical overdensity algorithms) can- 
not be determined here. The progenitor-descendant-relationship is 
calculated for the complete set of particles by comparison of the 
cluster-ordered sequences at two different output times. 

The hierarchical FOF algorithm identifies objects at different 
overdensities depending on the chosen linking length (More et al. 
2011). In order to avoid artificial misidentifications of subhaloes 
on high overdensities one can add an additional criterion. Here we 
have chosen the requirement that the spin parameter of the subhalo 
should be smaller than one. All subhaloes have been identified at 
512 times the virial overdensity. Thus only the highest density peak 
has been taken into account for the mass determination and the size 
of the object, which are therefore underestimated. The velocity of 
the density peak is estimated correctly but without removing un- 
bound particles. 

C7 GRASSHOPPER 

GRASSHOPPER (Stadel, in prep.) is based on a reworking of the 
SKID group finder (Stadel 2001). It finds density peaks and sub- 
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sequently determines all associated bound particles thereby identi- 
fying haloes. Particles are slowly slid along the local density gra- 
dient until they pool at a maximum, each pool corresponding to 
each initial group. This first phase of GRASSHOPPER can be 
computationally very expensive for large simulations, but is also 
quite robust. Each pool is then unbound by iteratively evaluating 
the bind- ing energy of every particle in their original positions and 
then removing the most non-bound particle until only bound parti- 
cles remain. This removes all particles that are not part of substruc- 
ture either because they are part of larger scale structure or because 
they are part of the background. 

C8 HBT 

HBT (Han et al. 2012) is a tracing algorithm working in the 
time domain of each subhaloes' evolution. Haloes are identified 
with a Friends-of-Friends algorithm and halo merger trees are con- 
structed. HBT then traverses the halo merger trees from the earliest 
to the latest time and identifies a self-bound remnant for every halo 
at every snapshot after infall. Care has been taken to ensure that 
subhaloes are robustly traced over long periods. The merging hier- 
archy of progenitor haloes are recorded to efficiently allow satellite- 
satellite mergers or satellite accretion.^" 



C9 HOT 

This algorithm, still under development, computes the Hierarchi- 
cal Overdensity Tree (HOT) of a point distribution in an arbitrary 
multidimensional space. HOT is introduced as an alternative to the 
minimal spanning tree for spaces where a metric is not well defined, 
like the phase space of particle positions and velocities. Rather than 
assuming an Eucidean metric, distance estimates are based on the 
Field Estimator for Arbitrary Spaces (FiEstAS, Ascasibar & Bin- 
ney 2005; Ascasibar 2010), where the data space is tessellated one 
dimension at a time, until it is divided into a set of hypercubi- 
cal cells containing exactly one particle. In the HOT-i-FiEstAS 
scheme, objects correspond to the peaks of the density field, and 
their boundaries are set by the isodensity contours at the saddle 
points. At each saddle point, the object containing less particles is 
attached to the most massive one, which may then be incorporated 
into even more massive objects in the hierarchy. 

Exactly the same algorithm has been applied to the particle po- 
sitions only (HOT3D) and the full set of phase-space coordinates 
(H0T6D). In order to optimize the method for the specific problem 
of halo finding, a post-processing routine, akin to a 'hard' expecta- 
tion maximization, has been developed, where -Rmax and Vmax are 
computed for every object in the catalogue, and objects with more 
than 10 particles within i?max are labelled as (sub)halo candidates. 
Then, particles are assigned to the candidate that contributes most 
to the phase-space density at their location, approximating each 
candidate by a Hernquist (1990) sphere. The final catalog consists 
of all objects that contain more than five particles within JJmax and 
an associated density above 100 times the critical value. 

CIO HSF 

The Hierarchical Structure Finder (HSF, Maciejewski et al. 2009) 
identifies objects as connected self-bound particle sets above some 

^^ It should be noted that HBT had access to the full snapshot data for 
Aquarius-A. 



density threshold. This method consists of two steps. Each particle 
is first linked to a local DM phase-space density maximum by fol- 
lowing the gradient of a particle-based estimate of the underlying 
DM phase-space density field. The particle set attached to a given 
maximum defines a candidate structure. In a second step, particles 
which are gravitationally unbound to the structure are discarded 
until a fully self-bound final object is obtained. 

In the initial step the phase-space density and phase-space gra- 
dients are estimated by using a six-dimensional SPH smoothing 
kernel with a local adaptive metric as implemented in the EnBiD 
code (Sharma & Steinmetz 2006). For the SPH kernel we use A^sph 
between 20 and 64 neighbours whereas for the gradient estimate 
we use Ai'ngb = 20 neighbours. 

Once phase-space densities have been calculated, we sort the 
particles according to their density in descending order. Then we 
start to grow structures from high to low phase-space densities. 
While walking down in density we mark for each particle the two 
closest (according to the local phase-space metric) neighbours with 
higher phase-space density, if such particles exist. In this way we 
grow disjoint structures until we encounter a saddle point, which 
can be identified by observing the two marked particles and see- 
ing if they belong to different structures. A saddle point occurs at 
the border of two structures. According to each structure mass, all 
the particles below this saddle point can be attached to only one 
of the structures if it is significantly more massive than the other 
one, or redistributed between both structures if they have compa- 
rable masses. This is controlled by a simple but robust cut or grow 
criterion depending on a connectivity parameter a which is rang- 
ing from 0.2 up to 1.0. In addition, we test on each saddle point 
if structures are statistically significant when compared to Poisson 
noise (controlled by a /3 parameter). At the end of this process, we 
obtain a hierarchical tree of structures. 

In the last step we check each structure against an unbind- 
ing criterion. Once we have marked its more massive partner for 
each structure, we sort them recursively such that the larger part- 
ners (parents) are always after the smaller ones (children). Then we 
unbind structure after structure from children to parents and add 
unbound particles to the larger partner. If the structure has less than 
A^'cut = 20 particles after the unbinding process, then we mark it 
as not bound and attach all its particles to its more massive partner. 
The most bound particle of each halo/subhalo defines its position 
centre. 

Although HSF can be used on the entire volume, to speed up 
the process of identification of the structures in the cosmological 
simulation volume we first apply the EOF method to disjoint the 
particles into smaller FOE groups. 



Cll JUMP-D 

JUMP-D is a galaxy finder and not a sub-Aa/o finder and hence 
is treated differently than the other finders in this work. It aims at 
finding and measuring central and satellite galaxies within given 
host haloes, i.e. baryonic substructure objects within a sphere of 
given radius Rum about the centre of the host. To this extent the 
stellar and gas mass profiles are searched for jumps (and hence the 
name) in the three-dimensional cumulative mass profiles from the 
host halo centre out to the limiting radius i?iim (i.e. usually the host 
halo's virial radius). The jump detection criterion is based on the 
detection of changes in the first and second derivatives of the re- 
spective mass profiles in the r, 6 and (j) variables at the substructure 
locations corresponding to the humps they cause. For the stellar 
object, the jump in the stellar mass profile is used as a first satel- 
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lite detection (i.e., location and velocity), that is later on refined by 
searching for maxima in 6-dimensional phase-space within an al- 
lowance region about that first center, returning the object stellar 
sizes Tstar 35 Well. The jumps in the gas profile are then matched 
to the stellar objects and gas particles inside a spherical region de- 
fined by the radial extend of the gas jump (rgas) are then associated 
to the stellar object. Note that for the detection of the jumps only 
cold gas is considered. 

Please note that the approach of JUMP-D is substantially dif- 
ferent to halo finders in general. The code only locates a baryonic 
object ('galaxy') without considering the dark matter. To this ex- 
tent JUMP-D cannot be subjected to the common post-processing 
pipeline when it comes to subhaloes as that pipeline heavily re- 
lies on the embedding of satellite galaxies within dark matter sub- 
haloes. 



C12 LANL 

The LANL halo finder is developed to provide on-the-fly halo anal- 
ysis for simulations utilizing hundreds of billions of particles, and 
is integrated into the MC"^ code (Habib et al. 2009), although it can 
also be used as a stand-alone halo finder. Its core is a fast fcD-tree 
FOF halo finder which uses 3D (block), structured decomposition 
to minimize surface to volume ratio of the domain assigned to each 
process. As it is aimed at large-scale structure simulations (lOO-l- 
Mpc//i on the side), where the size of any single halo is much 
smaller than the size of the whole box, it uses the concept of 'ghost 
zones' such that each process gets all the particles inside its do- 
main as well as those particles which are around the domain within 
a given distance (the overload size, a code parameter chosen to be 
larger then the size of the biggest halo we expect in the simula- 
tion). After each process runs its serial version of a FOF finder, 
MPI based 'halo stitching' is performed to ensure that every halo is 
accounted for, and accounted for only once. 

If desired, spherical 'SO' halo properties can be found using 
the FOF haloes as a proxy. Those SO haloes are centred at the parti- 
cle with the lowest gravitational potential, while the edge is at Ra 
- the radius enclosing an overdensity of A. It is well known that 
percolation based FOF haloes suffer from the over-bridging prob- 
lem; therefore, if we want to ensure completeness of our SO sample 
we should run FOF with a smaller linking length than usual in or- 
der to capture all density peaks, but still avoid over-bridging at the 
scale of interest (which depends on our choice of A). Overlapping 
SO haloes are permitted, but the centre of one halo may not reside 
inside another SO halo (that would be considered as a substructure, 
rather than a 'main' halo). The physical code parameters are the 
linking length for the FOF haloes, and overdensity parameter A 
for SO haloes. Technical parameters are the overload size and the 
minimum number of particles in a halo. 

The LANL halo finder is being included in the standard dis- 
tributions of PARAVIEw'^^ package, enabling researchers to com- 
bine analysis and visualization of their simulations. A substructure 
finder is currently under development. 

C13 Mendieta 

The Mendieta finder is a Friends-of-Friends based finder that is 
used to obtain a dark matter halo. This prospective host halo is 
subsequently refined by looking at peaks of increasing density by 



reducing the linking length. This approach decomposes the halo 
into its substructure plus other minor overdensities. In a final pass 
pass unbound particles are removed by checking their associated 
energies. MENDIETA is described more fully in Sgro et al. (2010). 



C14 ntropyFOF 

The Ntropy parallel programming framework is derived from N- 
body codes to help address a broad range of astrophysical problems 
^^. This includes an implementation of a simple but efficient FOF 
halo finder, NTROPYFOF, which is more fully described in Gard- 
ner et al. (2007a) and Gardner et al. (2007b). Ntropy provides a 
'distributed shared memory' (DSM) implementation of a fcD-tree, 
where the application developer can reference tree nodes as if they 
exist in a global address space, even though they are physically dis- 
tributed across many compute nodes. Ntropy uses the fcD-tree data 
structures to speed up the FOF distance searches. It also employs 
an implementation of the Shiloach & Vishkin (1982) parallel con- 
nectivity algorithm to link together the haloes that span separate 
processor domains. The advantage of this method is that no single 
computer node requires knowledge of all of the groups in the sim- 
ulation volume, meaning that NTROPYFOF is scalable to petascale 
platforms and handle large data input. This algorithm was used in 
the mock halo test cases to stitch together particle groups found 
across many threads into the one main FOF halo. As FOF is a de- 
terministic algorithm, NTROPYFOF takes a single physical linking 
length to group particles into FOF haloes without any perform- 
ing particle unbinding or subhalo identification. The halo centres 
for the analysis presented here use centre-of-mass estimates based 
on the FOF particle list. Ntropy achieves parallelisation by call- 
ing 'machine dependent library' (MDL) that consists of high-level 
operations such as 'acquire_treenode' or 'acquire_particle.' This li- 
brary is rewritten for a variety of models (MPI, POSIX Threads, 
Cray SHMEM, etc.), allowing the framework to extract the best 
performance from any parallel architecture on which it is run. 



CIS ORIGAMI 

ORIGAMI (Order-Reversing Gravity, Apprehended Mangling In- 
dices, Falck et al. 2012) uses a natural, parameter-free definition 
of the boundary between haloes and the non-halo environment 
around them: halo particles are particles that have experienced 
shell-crossing. This dynamical definition does not make use of the 
density field, in which the boundary can be quite ambiguous. In one 
dimension, shell crossings can be detected by looking for pairs of 
particles whose positions are out-of-order compared with their ini- 
tial positions. In 3D, then, a halo particle is defined as a particle that 
has undergone shell crossings along 3 orthogonal axes. Similarly, 
this would be 2 axes for a filament, 1 for a wall, and for a void. 
There is a huge number of possible sets of orthogonal axes in the 
initial grid to use to test for shell-crossing, but we only used four 
simple ones, which typically suffice to catch all the shell-crossings. 
We used the Cartesian x, y, and z axes, as well as the three sets of 
axes consisting of one Cartesian axis and two (45°) diagonal axes 
in the plane perpendicular to it. 

Once halo particles have been tagged, there are many possi- 
ble ways of grouping them into haloes. For this paper, we grouped 
them on a Voronoi tessellation of final-conditions particle positions. 



^^ http://www.paraview.org/ 
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This gives a natural density estimate (e.g. Schaap & van de Wey- 
gaert 2000, VTFE, Voronoi Tessellation Field Estimator) and set of 
neighbours for each particle. Haloes are sets of halo particles con- 
nected to each other on the Voronoi tessellation. To prevent haloes 
from being unduly linked, we additionally require that a halo con- 
tain at most one halo 'core', defined as a set of particles connected 
on the tessellation that all exceed a VTFE density threshold. This 
density threshold is the only parameter in our algorithm, since the 
initial tagging of halo particles is parameter-free; for this study, we 
set it to 200 times the mean density. We partition connected groups 
of halo particles with multiple cores into haloes as follows: each 
core iteratively collects particles in concentric rings of Voronoi 
neighbours until all halo particles are associated. The tagging pro- 
cedure establishes halo boundaries, so no unbinding procedure is 
necessary. Also, we note that currently, the algorithm does not iden- 
tify subhaloes. We remove haloes with fewer than 20 particles from 
the ORIGAMI halo catalogue, and the halo centre reported is the 
position of the halo's highest-density particle. 



C16 pFOF 

Parallel FOF (PFOF) is a MPI-based parallel Friends-of-Friends 
halo finder which is used within the DEUS Consortium ^'' at LUTH 
(Laboratory Universe and Theories). It has been parallelized by 
Roy and was used for several studies involving large A^'-body sim- 
ulations such as Courtin et al. (2011); Rasera et al. (2010). The 
principle is the following: first, particles are distributed in cubic 
subvolumes of the simulation and each processor deals with one 
'cube', and runs Friends-of-Friends locally. Then, if a structure is 
located close to the edge of a cube, PFOF checks if there are par- 
ticles belonging to the same halo in the neighbouring cube. This 
process is done iteratively until all haloes extending across multiple 
cubes have been merged. Finally, particles are sorted on a per halo 
basis, and the code writes two kinds of output: particles sorted per 
region, particles sorted per halo. This makes any post-processing 
straightforward because each halo or region can be analysed indi- 
vidually on a single CPU server. PFOF was successfully used on 
32768 cores for more than one hundred snapshots with 8192'^ par- 
ticles each (DEUS Full Universe Run Alimi et al. 2012). In this 
article, the serial version was used for mock haloes and small cos- 
mological simulations, and the parallel version for larger runs. The 
linking length was set to 6 = 0.2 (however see Courtin et al. 201 1, 
for a discussion on the halo definition), and the minimum halo mass 
to 100 particles. And the halo centres reported here are the centre- 
of-mass of the respective particle distribution. 



C17 pSO 

The parallel spherical overdensity (pSO) halo finder is a fast, highly 
scalable MPI-parallelized tool directly integrated into the FLASH 
simulation code that is designed to provide on-the-fly halo find- 
ing for use in subgrid modeling, merger tree analysis, and adaptive 
refinement schemes (Sutter & Ricker 2010). The pSO algorithm 
identifies haloes by growing SO spheres. There are four adjustable 
parameters, controlling the desired overdensity criteria for centre 
detection and halo size, the minimum allowed halo size, and the 
resolution of the halo radii relative to the grid resolution. The algo- 
rithm discovers halo centres by mapping dark matter particles onto 



the simulation mesh and selecting cell centres where the cell den- 
sity is greater than the given overdensity criterion. The algorithm 
then determines the halo edge using the SO radius by collecting 
particles using the FLASH AMR tree hierarchy. The algorithm de- 
termines the halo centre, bulk velocity, mass, and velocity disper- 
sion without additional post-processing. PSO is provided as both 
an API for use in-code and as a stand-alone halo finder. 



C18 ROCKSTAR 

Rockstar'^'* is a phase-space based halo finder designed to max- 
imize halo consistency across timesteps; as such, it is especially 
useful for studying merger trees and halo evolution (Behroozi et al. 
2013). RoCKSTAR first selects particle groups with a 3D Friends- 
of-Friends variant with a very large linking length (6 = 0.28). For 
each main FOF group, RoCKSTAR builds a hierarchy of FOF sub- 
groups in phase space by progressively and adaptively reducing the 
linking length, so that a tunable fraction (70 per cent, for this anal- 
ysis) of particles are captured at each subgroup as compared to the 
immediate parent group. For each subgroup, the phase-space metric 
is renormalized by the standard deviations of particle position and 
velocity. That is, for two particles pi and p2 in a given subgroup, 
the distance metric is defined as: 
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where a^ and (7„ are the particle position and velocity dispersions 
for the given subgroup. This metric ensures an adaptive selection 
of overdensities at each successive level of the FOF hierarchy. 

When this is complete, ROCKSTAR converts FOF subgroups 
into haloes beginning at the deepest level of the hierarchy. For 
a subgroup without any further sublevels, all the particles are as- 
signed to a single seed halo. If the parent group has no other sub- 
groups, then all the particles in the parent group are assigned to the 
same seed halo as the subgroup. However, if the parent group has 
multiple subgroups, then particles are assigned to the subgroups' 
seed haloes based on their phase-space proximity. In this case, the 
phase-space metric is set by halo properties, so that the distance 
between a halo h and a particle p is defined as: 
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where r^ir is the current virial radius of the seed halo and a„ is the 
current particle velocity dispersion. This process is repeated at all 
levels of the hierarchy until all particles in the base FOF group have 
been assigned to haloes. Unbinding is performed using the full par- 
ticle potentials (calculated using a modified Barnes & Hut method, 
Barnes & Hut (1986)); halo centres are defined by averaging parti- 
cle positions at the FOF hierarchy level which yields the minimum 
estimated Poisson error — which in practice amounts to averaging 
positions in a small region close to the phase-space density peak. 
For further details about the unbinding process and for details about 
accurate calculation of halo properties, please see Behroozi et al. in 
prep. 

ROCKSTAR is a massively parallel code (hybrid OpenMP/MPI 
style); it can already run on up to lO'' CPUs and on the very largest 
simulations (> 10^" particles). Additionally, it is very efficient, 
requiring only 56 bytes of memory per particle and 4-8 (total) CPU 
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^"^ ROCKSTAR is freely available from http : / /code . google . com/ 
p/rockstar 
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hours per billion particles in a simulation snapshot. The code is in 
the final stages of development; as such, the results in this paper are 
a minimum threshold for the performance and accuracy of the final 



C19 SKID 

SKID (Spline Kernel Interpolative Denmax)^'', first mentioned 
in Governato et al. (1997) and extensively described in Stadel 
(2001), finds density peaks within A'^-body simulations and sub- 
sequently determines all associated bound particles thereby iden- 
tifying haloes. It is important to stress that SKID will only find 
the smallest scale haloes within a hierarchy of haloes as is gener- 
ally seen in cosmological structure formation simulations. Unlike 
original DENMAX (Bertschinger & Gelb 1991; Gelb 1992) which 
used a fixed grid based density estimator, SKID uses SPH (i.e., 
smoothed particle hydrodynamics) kernel averaged densities which 
are much better suited to the Lagrangian nature of A^-body simu- 
lations and allow the method to locally adapt to the large dynamic 
range found in cosmological simulations. 

Particles are slowly slid (each step moving the particles by 
a distance of order the softening length in the simulation) along 
the local density gradient until they pool at a maximum, each pool 
corresponding to each initial group. This first phase of SKID can 
be computationally very expensive for large simulations, but is also 
quite robust. 

Each pool is then 'unbound' by iteratively evaluating the bind- 
ing energy of every particle in their original positions and then 
removing the most non-bound particle until only bound particles 
remain. This removes all particles that are not part of substructure 
either because they are part of larger scale structure or because they 
are part of the background. 

SKID can also identify structure composed of gas and stars 
in hydrodynamical simulations using the dark matter only for its 
gravitational binding effect. The "Haloes going MAD" meeting has 
motivated development of an improved version of the algorithm 
(now called GRASSHOPPER, see Section C7) capable of also 
running on parallel computers. 



C20 STF 

The STructure Finder Hierarchical Structure Finder (Elahi et al. 
201 1, (STF)) is a hybrid OpenMP-l-MPI code that identifies objects 
by utilizing the fact that dynamically distinct substructures in a halo 
will have a local velocity distribution that differs significantly from 
the mean, i.e. smooth background halo. This method consists of 
two main steps, identifying particles that appear dynamically dis- 
tinct and linking this outlier population using a Friends-of-Friends- 
like approach. Since this approach is capable of not only finding 
subhaloes, but tidal streams surrounding subhaloes as well as tidal 
streams from completely disrupted subhaloes, we also ensure that 
a group is self-bound. Particles which are gravitationally unbound 
to a candidate subhalo are discarded until a fully self-bound is ob- 
tained or the object consists of fewer than 20 particles, at which 
point the group is removed entirely. 



^^ Those interested in obtaining a copy of the code as well as a draft of the 
paper should contact the author at behroozi@stanford.edu. Current accept- 
able input formats for simulation files are ART. GADGET-2, and ASCII. 
^^ The OpenMP parallelized version of SKID can be freely downloaded 
from http : //www . hpcf orge . org 



C21 SUBFIND 

SUBFIND (Springel et al. 2001) identifies gravitationally bound, 
locally overdense regions within an input parent halo, traditionally 
provided by a FOE group finder, although other group finders could 
be used in principle as well. The densities are estimated based on 
the initial set of all particles via adaptive kernel interpolation based 
on a number Addons of smoothing neighbours. For each particle, the 
nearest A'^.^gb neighbours are then considered for identifying local 
overdensities through a topological approach that searches for sad- 
dle points in the isodensity contours within the global field of the 
halo. This is done in a top-down fashion, starting from the parti- 
cle with the highest associated density and adding particles with 
progressively lower densities in turn. If a particle has only denser 
neighbours in a single structure it is added to this region. If it is 
isolated it grows a new density peak, and if it has denser neigh- 
bours from two different structures, an isodensity contour that tra- 
verses a saddle point is identified. In the latter case, the two in- 
volved structures are joined and registered as candidate subhaloes 
if they contain at least A'^ngb particles. These candidates, selected 
according to the spatial distribution of particles only, are later pro- 
cessed for gravitational self-boundness. Particles with positive to- 
tal energy are iteratively dismissed until only bound particles re- 
main. The gravitational potential is computed with a tree algorithm, 
such that large haloes can be processed efficiently. If the remain- 
ing bound number of particles is at least A^ngb, the candidate is 
ultimately recorded as a subhalo. The set of initial substructure 
candidates forms a nested hierarchy that is processed from inside 
out, allowing the detection of substructures within substructures. 
However, a given particle may only become a member of one sub- 
structure, i.e. SUBFIND decomposes the initial group into a set of 
disjoint self-bound structures. Particles not bound to any genuine 
substructure are assigned to the 'background halo'. This compo- 
nent is also checked for self-boundness, so that some particles that 
are not bound to any of the structures may remain. For all sub- 
structures as well as the main halo, the particle with the minimum 
gravitational potential is adopted as (sub)halo centre. For the main 
halo, SUBFIND additionally calculates a SO virial mass around 
this centre, taking into account all particles in the simulation (i.e. 
not just those in the EOF group that is analyzed). There exist both 
serial and MPI-parallelized versions of SUBFIND, which imple- 
ment the same underlying algorithms. For more details we refer the 
reader to the paper by Springel et al. (2001). 



C22 VOBOZ 

Conceptually, a VOBOZ (VOronoi BOund Zones, Neyrinck et al. 
2005) halo or subhalo is a density peak surrounded by gravitation- 
ally bound particles that are down steepest-density gradients from 
the peak. A statistical significance is measured for each (sub)halo, 
based on the probability that Poisson noise would produce it. 

The only physical parameter in VOBOZ is the density thresh- 
old characterizing the edge of (parent) haloes (set to 200 times 
the mean density here), which typically only affects their mea- 
sured masses. To return a definite halo catalog, we also impose 
a statistical-significance threshold (set to 4-a here), although de- 
pending on the goal of a study, this may not be necessary. 

Density peaks are found using a Voronoi tessellation (par- 
allelizable by splitting up the volume), which gives an adaptive, 
parameter-free estimate of each particle's density and set of neigh- 
bours (e.g. Schaap & van de Weygaert 2000). Each particle is joined 
to the peak particle (whose position is returned as the halo centre) 
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that lies up the steepest density gradient from that particle. A halo 
associated with a high density peak will also contain smaller den- 
sity peaks. The significance of a halo is judged according to the ra- 
tio of its central density to a saddle point joining the halo to a halo 
with a higher central density, comparing to a Poisson point process. 
Pre-unbinding (sub)halo boundaries are defined along these density 
ridges. 

Unbinding evaporates many spurious haloes, and often brings 
other halo boundaries inward a bit, reducing the dependence on the 
outer density contrast. Particles not gravitationally bound to each 
halo are removed iteratively, by comparing their potential energies 
(measured as sums over all other particles) to kinetic energies with 
respect to the velocity centroid of the halo's core (i.e. the particles 
that directly jump up density gradients to the peak). The unbinding 
is parallelized using OpenMP. 

This paper has been typeset from a Tj5f/ MTgX file prepared by the 
author 
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